Re: [boost] Boost.Locale and the standard "message" facet

2 May 2011

      ...
...
This is std::message::get function:
string_type get (catalog cat, int set, int msgid,
const  string_type&dfault) const;
cat - is the "domain"  in Boost.Locale
set - is can be used as context but it is an  integer
and not some user friendly id - bad for localization
 msgid - is the identification of the specific message
but still integer  bad for localization
dfault - is the default returned string it  is not found
and it can be used as an alternative to msgid.
Now:
- if you want textual context you  can't
Well, you can always use a map of textual context that give you the  integer, 
isn't it?
How would you map it? Where would you keep it? How would you
convert it?
...
...
- if you want to get plural form you  can't.
Why? The fact the interface doesn't manage explicitly with plurals  doesn't 
mean you can not get them.
The interface must receive an integer for number as
parameter as you need several forms.
...
...
It uses  in input parameter of actual number to identify one
When you  call
format(translate("File was opened {1} day ago",
 "File was opened {1} days ago",
no_of_files) 
%  no_of_files
Which is basically, in Hebrew for example:
translate("File was opened {1} day ago",
"File was  opened {1} days ago",
no_of_files) 
when no_of_files == 1  returns "Kovetz niftah lifney yom {1}"
when no_of_files == 2 returns  "Kovetz niftah lifney yomaim"
when no_of_files <1 or >2 returns  "Kovetz niftah lifney {1} yamim"
And then format formats it with  no_of_files.
If the string is not in the dictionary then for  no_of_files==1
it returns "File was opened {1} day ago" and for  no_of_files==2 it
returns "File was opened {1} days  ago"
Sorry, but I don't understand how this works,
to which string  are you referring to on 
"If the string is not in ...?. Could you
show the  catalog associated to this
translation in English and in Hebrew?
If "File was opened {1} day ago" is not in dictionary that
it would be used as no Hebrew alternative provided, also
it would have 2 plural forms (as English) instead of
3 (in Hebrew).
...
...
...
How your library manage plurals for message that have several parameters?  
For
...
...
example
translate("%1 hours, %2  minutes, %3 seconds") % h % m % s
You do it in  different way
format(translate("Format date with H-M-S","{1},  {2}, {3}")) 
% format(translate("Format date with H-M-S","{1} hour","{1}  hours")) 
% format(translate("Format date with H-M-S","{1} minute","{1}  minutes")) 
% format(translate("Format date with H-M-S","{1}  second","{1} seconds"))
As a programmer, I would like a library that let  me write just
translate("%1 hours, %2 minutes, %3 seconds") % h % m %  s
As a translator, I would need to translate more than one string of  course.
For Slavic language it would be 4^3 = 64 strings. Not good.
...
...
In any case it is impossible to use it in real life.
I guess  some people is using it now.
Show me one program that uses them? At least
programs that work with MSVC does not as it is 
not implemented there...
...
...
"You  are going to connect to the untrusted web site {1} "
"its original is  unknown and you may be a victim of a scam"
I don't think it is good to  include such messages in the code :(.
This belongs to the translation  part.
Is it? Ask developers whether they prefer to write
the clear text inline in the context of the software
or have a separate unreadable key to something else.
...
...
So how would you put it into the code?
MyMessage::UntrustedWarning?
And if you have something slightly  different like
the encryption is too weak then programmers would  write
MyMessage::UntrustedWarning2?
Beleive me  this is what happens in real life..
I guess the programmer is able to  find more appropriated symbolic names, don't 
you?
How how many really meaningful identifier names have you
seen in production code?

I'm not talking about a theory, I'm talking
about real programmers.
...
...
It is about maintainability and linguistics.
As far as  I remember we didn't have maintenability issues.
But having separate files for messages without their
context (source files) and separate code without
clear messages.

It is bad and unmaintainable. It is doable but
it should never be done.
...
...
It is very important to have powerful  translation
tools that would allow you to merge translations
 work on them with built in spell checker and
so on.
You  do not work on translations today with a simple
text editor.
As I  said before, I was working with some years ago,
and we didn't need so much  tools.
Yes, it is possible to work without tools... With
gettext as well.

The question how is it better to work and what
is the way to do it.

I wonder if you have ever worked with tools
like PO-Edit or Lokalize on real messages and
have seen how convenient it is.
...
...
...
I've not take a look at your implementation yet
Please could you tell me when the translation file is read?
Is  the file parsed only once and the translations stored on a cache?
The dictionary parsed and loaded during generation of the  locale
then it is stored in the memory and not changed till
the  std::locale object is destroyed.
For long lived applications it could be  needed to force the
release of this memory when the default local change, isn't  it?
Just erase std::locale object?! What is the problem?

You can also reset std::locale::global with other 
locale
...
...
...
You could provide a defined way on top of this facet, isn't  it?
...
2. Support of pural forms
Plural forms can be designed on top of the message facet?
No, New message facet required
You have added one,  isn't it? If I'm not wrong gettext doesn't
take care of plurals, and you have  added something on top of.
It does.

See: http://linux.die.net/man/3/ngettext

It could be done without breaking binary 
messages format but it does not mean
that it is not implemented by gettext.
...
...
...
...
4.  Using natural language identifiers as keys
I have some use cases needing  a more compact format.
If you really want make your case "msg1234"...

But this is bad design.
...
I  think the opposite, English could you think there is no gender issue.
Letting the user write
translate("How is this  row?");
translate("How is this  color");
translate("Good");
translate("Bad");
is not good. I would  prefer the interface force the use of context.
Gender is only an example, there are much more,
you can force to use context but it is not
always required, because if the translation
is entire sentence then you don't need context
as it is self contained, but for short messages
like "Good" or "Open" it is required.
...
Yes a translate manipulator simplifies the code and is very  useful. 
Yes RAII is good, but I want also to be able to close it explicitly  also.
Destroy the locale object.
...
...
Standard message catalog requires you to store somewhere the  catalog 
variable
while the boost.Locale messages facet has some default  and allows to use
a string based key for domain.
I'm not saying  the standard can not be improved,
but I think it would be better to build on top  of it,
instead of providing two interfaces that use incompatible  catalogs.
Making internationalizable applications that use C++  internationalizable
libraries using different catalogs would be a complex for  the translator.
Really?

The C++0x had deprecated std::auto_ptr that everybody
uses and had given std::unique_ptr.

You are suggesting to enforce bad design to
good facet just because it exists and nobody
uses it?

I disagree. This std::messages facet should be
deprecated or even removed.
...
...
...
Well, I  think that it would be great if you can add a complete
comparison  of the interfaces and a rationale why you think your
design is  superior on the documentation.
Too many flaws, too  many problems... If so I should
write about 10-20 pages on flaws of all  facets around
From my side, it will be enough if you concentrate your  effort on the message 
facet ;-)
I think I had already done, hadn't I?
...
...
I have some small summary of  problems but full side by side?
Do you really need them?
I think  it will be useful in your documentation,
as you are proposing an alternative  design.
I also think that if you find the message facet
is not usable in real  life, you should make a
standard proposal to improve it (Why not for TR2?).
And I would suggest to deprecate std::message
facet along with many other broken facets.
...
I'm  sure you will have a lot of constructive feedback from
some  experts.
Current std::locale badly mimics POSIX/C locales
infrastructure and it was good at that point but
yet had included too many flaws from it
and introduced even more flaws.

In order to make useful TR2 proposal
you should do some groundbreaking and
do things like:

1. Standardize locale names
2. Standardize messages catalogs formats
3. Rewrite some of existing facets
   completely
4. Deprecate some of the facets and functions.

The 3 and 4 are quite easy to do however the 1st 
and the 2nd would be very hard if possible at all.

Even the C++03/C++11 that fully mimics and copies
POSIX message catalogs: catgets, catopen, catclose
hadn't defined anything useful about them or
referred to POSIX standards.

So... Yes, I'd like to see such things in TR2
but believe me message catalogs facet is the
easiest things to rewrite, while the
real localization problem lays far beyond them.

This what really concerns me in the standardization
of localization facilities.

Artyom