Re: [boost] Boost.Locale and the standard "message" facet

Hi,
Message du 30/04/11 15:48 De : "Artyom" A : boost@lists.boost.org Copie à : Objet : Re: [boost] Boost.Locale and the standard "message" facet
Subject: [boost] Boost.Locale and the standard "message" facet
Hi,
I was wondering how Boost.Locale is related to the standard message facet which is used to translate messages.
The standard message catalogs allow to extract messages by integer identifiers but may use string identifiers and it is implementation defined
It is undefined how to load message facets
Well implementation defined doesn't mean undefined.
or format them and so on.
The formatting can always be done on top of this facet, isn't it?
It does not support plural forms and context.
I'm not an expert, but doesn't catalogs and the set parameter can be used for your domain an context? Respect to plural forms, how Boost.Locale manages with locales that have 3 or 4 forms of plurals (If I understood your private mail, Hebrew is an example of this)? I've the impression that Boost.Locale manage simple plurals but not all kind of plurals? BTW, what is the criteria in Boost.Locale to identify a plural form? How your library manage plurals for message that have several parameters? For example translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
It is the most unless facet around.
What do you mean? That the scope is reduced?
Note that the facet interface work with integer identifiers avoiding all the issues raised by the get_text/translate functions provided by Boost.Locale.
Use of integer identifiers is the best way to screw the localization in the software.
What does 3456 means? Do you really think it is good to write translate(MY_MESSAGE_OPENING_FILE)
Why not? Note that we can also write translate(MyMessage::OpeningFile) if this seems clearer to you
No, never - never - never - never - never use such "constant" or "integer" identifiers.
I understand that some as you can find that the use of integers scale worst than the use of strings as we need to maintain constants unique on a given context, but this can be checked on debug mode. Some time ago when I had to write a localization application, we used integer identifier to log internationalizable messages. This reduced drastically the size of the log. I guess only this advantage was enough to take integers instead of strings, for us of course. An advantage I see is that you need to concentrate all the message id of a set/context in a single file so no need to have tools that parse your code to get the strings to translate. BTW, how do you recover message translation from a domain with your underlying implementation, copy/paste or is there a possibility to have specific files for specific domains, or there is a single translation file by locale? I've not take a look at your implementation yet. Please could you tell me when the translation file is read? Is the file parsed only once and the translations stored on a cache?
Always use natural text.
I think your natural language interface lets the user to have the impression that anything is possible, when a lot of limitations seem to be there as other have already commented in this ML.
Is there any reason Boost.Locale could not follow the standard design?
The standard message catalogs to weak
I would like to here the rationale of the standard message design by someone that is aware of it (some pointers will be welcome also).
What are the advantages of the Boost.Locale design?
1. Defined way to load and format catalogs
You could provide a defined way on top of this facet, isn't it?
2. Support of pural forms
Plural forms can be designed on top of the message facet?
3. Support of message-context
As I said, I suspect that the set parameter is interpreted as your context.
4. Using natural language identifiers as keys
See below. This is not always an advantage, and as far as I see adds some constraints on the interface so the tool can take care to automatically extract the strings to translate.
5. Convenense interface What do you mean? Could you compare both?
More?
Well, I think that it would be great if you can add a complete comparison of the interfaces and a rationale why you think your design is superior on the documentation. Best, Vicente

From: Vicente BOTET <vicente.botet@wanadoo.fr>
Hi,
Message du 30/04/11 15:48 De : "Artyom" A : boost@lists.boost.org Copie à : Objet : Re: [boost] Boost.Locale and the standard "message" facet
Subject: [boost] Boost.Locale and the standard "message" facet
Hi,
I was wondering how Boost.Locale is related to the standard message facet which is used to translate messages.
The standard message catalogs allow to extract messages by integer identifiers but may use string identifiers and it is implementation defined
It is undefined how to load message facets
Well implementation defined doesn't mean undefined.
But it makes it useless as each compiler can do anything it wants.
or format them and so on.
The formatting can always be done on top of this facet, isn't it?
By formatting I mean the entire infrastructure of catalog formats, binary formats, message extracting software, user friendly translation tools like po-edit and so on.
It does not support plural forms and context.
I'm not an expert, but doesn't catalogs and the set parameter can be used for your domain an context?
This is std::message::get function: string_type get (catalog cat, int set, int msgid, const string_type&dfault) const; cat - is the "domain" in Boost.Locale set - is can be used as context but it is an integer and not some user friendly id - bad for localization msgid - is the identification of the specific message but still integer bad for localization dfault - is the default returned string it is not found and it can be used as an alternative to msgid. Now: - if you want textual context you can't - if you want to get plural form you can't. So basically it is too weak and limited.
Respect to plural forms, how Boost.Locale manages with locales that have 3 or 4 forms of plurals (If I understood your private mail, Hebrew is an example of this)? I've the impression that Boost.Locale manage simple plurals but not all kind of plurals?
It handles all kinds of purals.
BTW, what is the criteria in Boost.Locale to identify a plural form?
It uses in input parameter of actual number to identify one When you call format(translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) % no_of_files Which is basically, in Hebrew for example: translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) when no_of_files == 1 returns "Kovetz niftah lifney yom {1}" when no_of_files == 2 returns "Kovetz niftah lifney yomaim" when no_of_files <1 or >2 returns "Kovetz niftah lifney {1} yamim" And then format formats it with no_of_files. If the string is not in the dictionary then for no_of_files==1 it returns "File was opened {1} day ago" and for no_of_files==2 it returns "File was opened {1} days ago"
How your library manage plurals for message that have several parameters? For example
translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
You do it in different way format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds")) Basically you provide good context "Format date with H-M-S" and a basic pattern for formatting "{1}, {2}, {3}" which translator can alter then you translate with same context three subpatterns each with its own plural form,
It is the most unless facet around.
What do you mean? That the scope is reduced?
It should be useless not unless. In any case it is impossible to use it in real life.
Note that the facet interface work with integer identifiers avoiding all the issues raised by the get_text/translate functions provided by Boost.Locale.
Use of integer identifiers is the best way to screw the localization in the software.
What does 3456 means? Do you really think it is good to write translate(MY_MESSAGE_OPENING_FILE)
Why not? Note that we can also write translate(MyMessage::OpeningFile) if this seems clearer to you
Same problems as in real file you need quite a complicated strings and expressions. Many messages are not just "Open a file" but rather: "You are going to connect to the untrusted web site {1} " "its original is unknown and you may be a victim of a scam" So how would you put it into the code? MyMessage::UntrustedWarning? And if you have something slightly different like the encryption is too weak then programmers would write MyMessage::UntrustedWarning2? Beleive me this is what happens in real life.. See notes below about rules of thumb.
No, never - never - never - never - never use such "constant" or "integer" identifiers.
I understand that some as you can find that the use of integers scale worst than the use of strings as we need to maintain constants unique on a given context, but this can be checked on debug mode.
It is about maintainability and linguistics.
Some time ago when I had to write a localization application, we used integer identifier to log internationalizable messages. This reduced drastically the size of the log. I guess only this advantage was enough to take integers instead of strings, for us of course.
If you really want short identifiers for specific cases, you for example can write things like log() << "EINVAL" or log() << "MSG::Inval"
An advantage I see is that you need to concentrate all the message id of a set/context in a single file so no need to have tools that parse your code to get the strings to translate.
You need so many tools that the tool that extracts the strings from sources code is the minor one. It is very important to have powerful translation tools that would allow you to merge translations work on them with built in spell checker and so on. You do not work on translations today with a simple text editor.
BTW, how do you recover message translation from a domain with your underlying implementation, copy/paste or is there a possibility to have specific files for specific domains, or there is a single translation file by locale?
It depends on your design. If you for example have a single program just use a single domain named after your program, but if you have for example some independent component it may be used in its own domain. In any case each dictionary is per locale (or language) and per domain.
I've not take a look at your implementation yet Please could you tell me when the translation file is read? Is the file parsed only once and the translations stored on a cache?
The dictionary parsed and loaded during generation of the locale then it is stored in the memory and not changed till the std::locale object is destroyed.
Always use natural text.
I think your natural language interface lets the user to have the impression that anything is possible, when a lot of limitations seem to be there as other have already commented in this ML.
The natural language interface is the most powerful.
Is there any reason Boost.Locale could not follow the standard design?
The standard message catalogs to weak
I would like to here the rationale of the standard message design by someone that is aware of it (some pointers will be welcome also).
What are the advantages of the Boost.Locale design?
1. Defined way to load and format catalogs
You could provide a defined way on top of this facet, isn't it?
2. Support of pural forms
Plural forms can be designed on top of the message facet?
No, New message facet required
3. Support of message-context
As I said, I suspect that the set parameter is interpreted as your context.
No see notes above.
4. Using natural language identifiers as keys
See below. This is not always an advantage, and as far as I see adds some constraints on the interface so the tool can take care to automatically extract the strings to translate.
You are looking on the problem from a pure software engineer point of view, however when it comes to UI and Localization there are two important rules of thumb: 1. Provide as much information as possible to make life on the translator as easier as possible, for example: a) Context Instead of MyMessage::FileOpen Provide: "File Opening Dialog", "Open" b) The unit gettext provides an option to extract nearby comments from the source so in the code // We open a file in CSV format with // prices of the items. AddMenuItem(translate("File Opening Dialog", "Open")) And the translator will see all the text! 2. Assume as few as possible - make as generic interface For example you have a dialogs. How is this row? <Good> <Bad> How is this color? <Good> <Bad> So you can translate: translate("How is this row?"); translate("How is this color"); translate("Good"); translate("Bad"); What is the problem with this? Think about a minute before you look behind... Gender, in some languages row and color have different gender so Good and Bad should have different forms according to gender. So you need to write translate("How is this row?"); translate("About row","Good"); translate("About row","Bad"); translate("How is this color"); translate("About color","Good"); translate("About color","Bad"); This is what I'm talking about assumptions. Making integer identifiers would detach the strings from the context even more and it is bad. So yes, you can use integers but it is VERY-BAD design.
5. Convenense interface What do you mean? Could you compare both?
Yes // get generation stage std::message::catalog domain_id = std::use_facet<std::messages<char>
(l).open("domain",l);
// in use AddMenuItem(std::use_facet<std::messages<char>
(std::locale()).get(domain_id,0,0,"Open File"))
// at close std::use_facet<std::messages<char> >(l).close(domain_id); Or boost::locale::generator gen; gen.add_messages_domain("domain"); std::locale::global(gen("")); // in use AddMenuItem(translate("Open File")); // destroyed with the locale Standard message catalog requires you to store somewhere the catalog variable while the boost.Locale messages facet has some default and allows to use a string based key for domain.
More?
Well, I think that it would be great if you can add a complete comparison of the interfaces and a rationale why you think your design is superior on the documentation.
Too many flaws, too many problems... If so I should write about 10-20 pages on flaws of all facets around I have some small summary of problems but full side by side? Do you really need them?
Best, Vicente
Artyom

Message du 02/05/11 11:28 De : "Artyom" A : boost@lists.boost.org Copie à : Objet : Re: [boost] Boost.Locale and the standard "message" facet
From: Vicente BOTET
Hi,
Message du 30/04/11 15:48 De : "Artyom" A : boost@lists.boost.org Copie à : Objet : Re: [boost] Boost.Locale and the standard "message" facet
Subject: [boost] Boost.Locale and the standard "message" facet
Hi,
I was wondering how Boost.Locale is related to the standard message facet which is used to translate messages.
The standard message catalogs allow to extract messages by integer identifiers but may use string identifiers and it is implementation defined
It is undefined how to load message facets
Well implementation defined doesn't mean undefined.
But it makes it useless as each compiler can do anything it wants.
or format them and so on.
The formatting can always be done on top of this facet, isn't it?
By formatting I mean the entire infrastructure of catalog formats, binary formats, message extracting software, user friendly translation tools like po-edit and so on.
It does not support plural forms and context.
I'm not an expert, but doesn't catalogs and the set parameter can be used for your domain an context?
This is std::message::get function:
string_type get (catalog cat, int set, int msgid, const string_type&dfault) const;
cat - is the "domain" in Boost.Locale set - is can be used as context but it is an integer and not some user friendly id - bad for localization msgid - is the identification of the specific message but still integer bad for localization
dfault - is the default returned string it is not found and it can be used as an alternative to msgid.
Now:
- if you want textual context you can't
Well, you can always use a map of textual context that give you the integer, isn't it?
- if you want to get plural form you can't.
Why? The fact the interface doesn't manage explicitly with plurals doesn't mean you can not get them.
So basically it is too weak and limited.
Respect to plural forms, how Boost.Locale manages with locales that have 3 or 4 forms of plurals (If I understood your private mail, Hebrew is an example of this)? I've the impression that Boost.Locale manage simple plurals but not all kind of plurals?
It handles all kinds of purals.
BTW, what is the criteria in Boost.Locale to identify a plural form?
It uses in input parameter of actual number to identify one
When you call
format(translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) % no_of_files
Which is basically, in Hebrew for example:
translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) when no_of_files == 1 returns "Kovetz niftah lifney yom {1}" when no_of_files == 2 returns "Kovetz niftah lifney yomaim" when no_of_files <1 or >2 returns "Kovetz niftah lifney {1} yamim"
And then format formats it with no_of_files.
If the string is not in the dictionary then for no_of_files==1 it returns "File was opened {1} day ago" and for no_of_files==2 it returns "File was opened {1} days ago"
Sorry, but I don't understand how this works, to which string are you referring to on "If the string is not in ...?. Could you show the catalog associated to this translation in English and in Hebrew?
How your library manage plurals for message that have several parameters? For example
translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
You do it in different way
format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds"))
As a programmer, I would like a library that let me write just translate("%1 hours, %2 minutes, %3 seconds") % h % m % s As a translator, I would need to translate more than one string of course.
Basically you provide good context "Format date with H-M-S" and a basic pattern for formatting "{1}, {2}, {3}" which translator can alter then you translate with same context three subpatterns each with its own plural form,
It is the most unless facet around.
What do you mean? That the scope is reduced?
It should be useless not unless.
In any case it is impossible to use it in real life.
I guess some people is using it now.
Note that the facet interface work with integer identifiers avoiding all the issues raised by the get_text/translate functions provided by Boost.Locale.
Use of integer identifiers is the best way to screw the localization in the software.
What does 3456 means? Do you really think it is good to write translate(MY_MESSAGE_OPENING_FILE)
Why not? Note that we can also write translate(MyMessage::OpeningFile) if this seems clearer to you
Same problems as in real file you need quite a complicated strings and expressions. Many messages are not just "Open a file" but rather:
"You are going to connect to the untrusted web site {1} " "its original is unknown and you may be a victim of a scam"
I don't think it is good to include such messages in the code :(. This belongs to the translation part.
So how would you put it into the code?
MyMessage::UntrustedWarning?
And if you have something slightly different like the encryption is too weak then programmers would write
MyMessage::UntrustedWarning2?
Beleive me this is what happens in real life..
I guess the programmer is able to find more appropriated symbolic names, don't you?
See notes below about rules of thumb.
No, never - never - never - never - never use such "constant" or "integer" identifiers.
I understand that some as you can find that the use of integers scale worst than the use of strings as we need to maintain constants unique on a given context, but this can be checked on debug mode.
It is about maintainability and linguistics.
As far as I remember we didn't have maintenability issues.
Some time ago when I had to write a localization application, we used integer identifier to log internationalizable messages. This reduced drastically the size of the log. I guess only this advantage was enough to take integers instead of strings, for us of course.
If you really want short identifiers for specific cases, you for example can write things like
log() << "EINVAL"
or
log() << "MSG::Inval"
This is not as bad as the long translation message, but is not yet optimal when storage could be an issue.
An advantage I see is that you need to concentrate all the message id of a set/context in a single file so no need to have tools that parse your code to get the strings to translate.
You need so many tools that the tool that extracts the strings from sources code is the minor one.
It is very important to have powerful translation tools that would allow you to merge translations work on them with built in spell checker and so on.
You do not work on translations today with a simple text editor.
As I said before, I was working with some years ago, and we didn't need so much tools.
BTW, how do you recover message translation from a domain with your underlying implementation, copy/paste or is there a possibility to have specific files for specific domains, or there is a single translation file by locale?
It depends on your design.
If you for example have a single program just use a single domain named after your program, but if you have for example some independent component it may be used in its own domain.
In any case each dictionary is per locale (or language) and per domain.
Ok, I see. This is fine.
I've not take a look at your implementation yet Please could you tell me when the translation file is read? Is the file parsed only once and the translations stored on a cache?
The dictionary parsed and loaded during generation of the locale then it is stored in the memory and not changed till the std::locale object is destroyed.
For long lived applications it could be needed to force the release of this memory when the default local change, isn't it?
Always use natural text.
I think your natural language interface lets the user to have the impression that anything is possible, when a lot of limitations seem to be there as other have already commented in this ML.
The natural language interface is the most powerful.
Is there any reason Boost.Locale could not follow the standard design?
The standard message catalogs to weak
I would like to here the rationale of the standard message design by someone that is aware of it (some pointers will be welcome also).
What are the advantages of the Boost.Locale design?
1. Defined way to load and format catalogs
You could provide a defined way on top of this facet, isn't it?
2. Support of pural forms
Plural forms can be designed on top of the message facet?
No, New message facet required
You have added one, isn't it? If I'm not wrong gettext doesn't take care of plurals, and you have added something on top of.
3. Support of message-context
As I said, I suspect that the set parameter is interpreted as your context.
No see notes above.
I have copied here what you said about the set parameter.
set - is can be used as context but it is an integer and not some user friendly id - bad for localization
so I gues this mean that yes, it supports message context.
4. Using natural language identifiers as keys
I have some use cases needing a more compact format.
See below. This is not always an advantage, and as far as I see adds some constraints on the interface so the tool can take care to automatically extract the strings to translate.
You are looking on the problem from a pure software engineer point of view, however when it comes to UI and Localization there are two important rules of thumb:
1. Provide as much information as possible to make life on the translator as easier as possible, for example:
a) Context
Instead of MyMessage::FileOpen
Provide: "File Opening Dialog", "Open"
Or FileOpeningDialog_Open
b) The unit gettext provides an option to extract nearby comments from the source so in the code
// We open a file in CSV format with // prices of the items. AddMenuItem(translate("File Opening Dialog", "Open"))
And the translator will see all the text!
2. Assume as few as possible - make as generic interface
For example you have a dialogs.
How is this row?
How is this color?
So you can translate:
translate("How is this row?"); translate("How is this color"); translate("Good"); translate("Bad");
What is the problem with this?
Think about a minute before you look behind...
Gender, in some languages row and color have different gender so Good and Bad should have different forms according to gender. So you need to write
translate("How is this row?"); translate("About row","Good"); translate("About row","Bad"); translate("How is this color"); translate("About color","Good"); translate("About color","Bad");
This is what I'm talking about assumptions.
Making integer identifiers would detach the strings from the context even more and it is bad.
I think the opposite, English could you think there is no gender issue. Letting the user write translate("How is this row?"); translate("How is this color"); translate("Good"); translate("Bad"); is not good. I would prefer the interface force the use of context.
So yes, you can use integers but it is VERY-BAD design.
5. Convenense interface What do you mean? Could you compare both?
Yes
// get generation stage std::message::catalog domain_id = std::use_facet
(l).open("domain",l);
// in use AddMenuItem(std::use_facet
(std::locale()).get(domain_id,0,0,"Open File"))
// at close std::use_facet >(l).close(domain_id);
Or
boost::locale::generator gen; gen.add_messages_domain("domain"); std::locale::global(gen(""));
// in use AddMenuItem(translate("Open File"));
// destroyed with the locale
Yes a translate manipulator simplifies the code and is very useful. Yes RAII is good, but I want also to be able to close it explicitly also.
Standard message catalog requires you to store somewhere the catalog variable while the boost.Locale messages facet has some default and allows to use a string based key for domain.
I'm not saying the standard can not be improved, but I think it would be better to build on top of it, instead of providing two interfaces that use incompatible catalogs. Making internationalizable applications that use C++ internationalizable libraries using different catalogs would be a complex for the translator.
More?
Well, I think that it would be great if you can add a complete comparison of the interfaces and a rationale why you think your design is superior on the documentation.
Too many flaws, too many problems... If so I should write about 10-20 pages on flaws of all facets around
From my side, it will be enough if you concentrate your effort on the message facet ;-)
I have some small summary of problems but full side by side? Do you really need them?
I think it will be useful in your documentation, as you are proposing an alternative design. I also think that if you find the message facet is not usable in real life, you should make a standard proposal to improve it (Why not for TR2?). I'm sure you will have a lot of constructive feedback from some experts. Best, Vicente

This is std::message::get function:
string_type get (catalog cat, int set, int msgid, const string_type&dfault) const;
cat - is the "domain" in Boost.Locale set - is can be used as context but it is an integer and not some user friendly id - bad for localization msgid - is the identification of the specific message but still integer bad for localization
dfault - is the default returned string it is not found and it can be used as an alternative to msgid.
Now:
- if you want textual context you can't
Well, you can always use a map of textual context that give you the integer, isn't it?
How would you map it? Where would you keep it? How would you convert it?
- if you want to get plural form you can't.
Why? The fact the interface doesn't manage explicitly with plurals doesn't mean you can not get them.
The interface must receive an integer for number as parameter as you need several forms.
It uses in input parameter of actual number to identify one
When you call
format(translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) % no_of_files
Which is basically, in Hebrew for example:
translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) when no_of_files == 1 returns "Kovetz niftah lifney yom {1}" when no_of_files == 2 returns "Kovetz niftah lifney yomaim" when no_of_files <1 or >2 returns "Kovetz niftah lifney {1} yamim"
And then format formats it with no_of_files.
If the string is not in the dictionary then for no_of_files==1 it returns "File was opened {1} day ago" and for no_of_files==2 it returns "File was opened {1} days ago"
Sorry, but I don't understand how this works, to which string are you referring to on "If the string is not in ...?. Could you show the catalog associated to this translation in English and in Hebrew?
If "File was opened {1} day ago" is not in dictionary that it would be used as no Hebrew alternative provided, also it would have 2 plural forms (as English) instead of 3 (in Hebrew).
How your library manage plurals for message that have several parameters? For
example
translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
You do it in different way
format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds"))
As a programmer, I would like a library that let me write just
translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
As a translator, I would need to translate more than one string of course.
For Slavic language it would be 4^3 = 64 strings. Not good.
In any case it is impossible to use it in real life.
I guess some people is using it now.
Show me one program that uses them? At least programs that work with MSVC does not as it is not implemented there...
"You are going to connect to the untrusted web site {1} " "its original is unknown and you may be a victim of a scam"
I don't think it is good to include such messages in the code :(. This belongs to the translation part.
Is it? Ask developers whether they prefer to write the clear text inline in the context of the software or have a separate unreadable key to something else.
So how would you put it into the code?
MyMessage::UntrustedWarning?
And if you have something slightly different like the encryption is too weak then programmers would write
MyMessage::UntrustedWarning2?
Beleive me this is what happens in real life..
I guess the programmer is able to find more appropriated symbolic names, don't you?
How how many really meaningful identifier names have you seen in production code? I'm not talking about a theory, I'm talking about real programmers.
It is about maintainability and linguistics.
As far as I remember we didn't have maintenability issues.
But having separate files for messages without their context (source files) and separate code without clear messages. It is bad and unmaintainable. It is doable but it should never be done.
It is very important to have powerful translation tools that would allow you to merge translations work on them with built in spell checker and so on.
You do not work on translations today with a simple text editor.
As I said before, I was working with some years ago, and we didn't need so much tools.
Yes, it is possible to work without tools... With gettext as well. The question how is it better to work and what is the way to do it. I wonder if you have ever worked with tools like PO-Edit or Lokalize on real messages and have seen how convenient it is.
I've not take a look at your implementation yet Please could you tell me when the translation file is read? Is the file parsed only once and the translations stored on a cache?
The dictionary parsed and loaded during generation of the locale then it is stored in the memory and not changed till the std::locale object is destroyed.
For long lived applications it could be needed to force the release of this memory when the default local change, isn't it?
Just erase std::locale object?! What is the problem? You can also reset std::locale::global with other locale
You could provide a defined way on top of this facet, isn't it?
2. Support of pural forms
Plural forms can be designed on top of the message facet?
No, New message facet required
You have added one, isn't it? If I'm not wrong gettext doesn't take care of plurals, and you have added something on top of.
It does. See: http://linux.die.net/man/3/ngettext It could be done without breaking binary messages format but it does not mean that it is not implemented by gettext.
4. Using natural language identifiers as keys
I have some use cases needing a more compact format.
If you really want make your case "msg1234"... But this is bad design.
I think the opposite, English could you think there is no gender issue.
Letting the user write
translate("How is this row?"); translate("How is this color"); translate("Good"); translate("Bad");
is not good. I would prefer the interface force the use of context.
Gender is only an example, there are much more, you can force to use context but it is not always required, because if the translation is entire sentence then you don't need context as it is self contained, but for short messages like "Good" or "Open" it is required.
Yes a translate manipulator simplifies the code and is very useful. Yes RAII is good, but I want also to be able to close it explicitly also.
Destroy the locale object.
Standard message catalog requires you to store somewhere the catalog variable while the boost.Locale messages facet has some default and allows to use a string based key for domain.
I'm not saying the standard can not be improved, but I think it would be better to build on top of it, instead of providing two interfaces that use incompatible catalogs. Making internationalizable applications that use C++ internationalizable libraries using different catalogs would be a complex for the translator.
Really? The C++0x had deprecated std::auto_ptr that everybody uses and had given std::unique_ptr. You are suggesting to enforce bad design to good facet just because it exists and nobody uses it? I disagree. This std::messages facet should be deprecated or even removed.
Well, I think that it would be great if you can add a complete comparison of the interfaces and a rationale why you think your design is superior on the documentation.
Too many flaws, too many problems... If so I should write about 10-20 pages on flaws of all facets around
From my side, it will be enough if you concentrate your effort on the message facet ;-)
I think I had already done, hadn't I?
I have some small summary of problems but full side by side? Do you really need them?
I think it will be useful in your documentation, as you are proposing an alternative design.
I also think that if you find the message facet is not usable in real life, you should make a standard proposal to improve it (Why not for TR2?).
And I would suggest to deprecate std::message facet along with many other broken facets.
I'm sure you will have a lot of constructive feedback from some experts.
Current std::locale badly mimics POSIX/C locales infrastructure and it was good at that point but yet had included too many flaws from it and introduced even more flaws. In order to make useful TR2 proposal you should do some groundbreaking and do things like: 1. Standardize locale names 2. Standardize messages catalogs formats 3. Rewrite some of existing facets completely 4. Deprecate some of the facets and functions. The 3 and 4 are quite easy to do however the 1st and the 2nd would be very hard if possible at all. Even the C++03/C++11 that fully mimics and copies POSIX message catalogs: catgets, catopen, catclose hadn't defined anything useful about them or referred to POSIX standards. So... Yes, I'd like to see such things in TR2 but believe me message catalogs facet is the easiest things to rewrite, while the real localization problem lays far beyond them. This what really concerns me in the standardization of localization facilities. Artyom

Message du 02/05/11 16:02 De : "Artyom" A : boost@lists.boost.org Copie à : Objet : Re: [boost] Boost.Locale and the standard "message" facet
This is std::message::get function:
string_type get (catalog cat, int set, int msgid, const string_type&dfault) const;
cat - is the "domain" in Boost.Locale set - is can be used as context but it is an integer and not some user friendly id - bad for localization msgid - is the identification of the specific message but still integer bad for localization
dfault - is the default returned string it is not found and it can be used as an alternative to msgid.
Now:
- if you want textual context you can't
Well, you can always use a map of textual context that give you the integer, isn't it?
How would you map it? Where would you keep it? How would you convert it?
You can define this map in a centralized way initialized staticaly.
- if you want to get plural form you can't.
Why? The fact the interface doesn't manage explicitly with plurals doesn't mean you can not get them.
The interface must receive an integer for number as parameter as you need several forms.
You can use several integers for the translation of plurals.
It uses in input parameter of actual number to identify one
When you call
format(translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) % no_of_files
Which is basically, in Hebrew for example:
translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) when no_of_files == 1 returns "Kovetz niftah lifney yom {1}" when no_of_files == 2 returns "Kovetz niftah lifney yomaim" when no_of_files <1 or >2 returns "Kovetz niftah lifney {1} yamim"
And then format formats it with no_of_files.
If the string is not in the dictionary then for no_of_files==1 it returns "File was opened {1} day ago" and for no_of_files==2 it returns "File was opened {1} days ago"
Sorry, but I don't understand how this works, to which string are you referring to on "If the string is not in ...?. Could you show the catalog associated to this translation in English and in Hebrew?
If "File was opened {1} day ago" is not in dictionary that it would be used as no Hebrew alternative provided, also it would have 2 plural forms (as English) instead of 3 (in Hebrew).
I insists, could you show the catalog associated to this translation in English and in Hebrew? I'm sure I'm missing something and I don't reach to see what.
How your library manage plurals for message that have several parameters? For
example
translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
You do it in different way
format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds"))
As a programmer, I would like a library that let me write just
translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
As a translator, I would need to translate more than one string of course.
For Slavic language it would be 4^3 = 64 strings. Not good.
You are right if the translate function uses just one translation. What I was trying to get is that the translate function with 3 arguments behaves like yours format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds")) The single problem I see which character use to split the string. Maybe % could be used translate("%1 hours%,% %2 minutes%,% %3 seconds") % h % m % s
In any case it is impossible to use it in real life.
I guess some people is using it now.
Show me one program that uses them? At least programs that work with MSVC does not as it is not implemented there...
"You are going to connect to the untrusted web site {1} " "its original is unknown and you may be a victim of a scam"
I don't think it is good to include such messages in the code :(. This belongs to the translation part.
Is it? Ask developers whether they prefer to write the clear text inline in the context of the software or have a separate unreadable key to something else.
So how would you put it into the code?
MyMessage::UntrustedWarning?
And if you have something slightly different like the encryption is too weak then programmers would write
MyMessage::UntrustedWarning2?
Beleive me this is what happens in real life..
I guess the programmer is able to find more appropriated symbolic names, don't you?
How how many really meaningful identifier names have you seen in production code?
I'm not talking about a theory, I'm talking about real programmers.
I didn't know that my team and I were not real programmers ;-)
It is about maintainability and linguistics.
As far as I remember we didn't have maintainability issues.
But having separate files for messages without their context (source files) and separate code without clear messages.
IMO, this is relation is part of a specification document.
It is bad and unmaintainable. It is doable but it should never be done.
It is very important to have powerful translation tools that would allow you to merge translations work on them with built in spell checker and so on.
You do not work on translations today with a simple text editor.
As I said before, I was working with some years ago, and we didn't need so much tools.
Yes, it is possible to work without tools... With gettext as well.
The question how is it better to work and what is the way to do it.
I wonder if you have ever worked with tools like PO-Edit or Lokalize on real messages and have seen how convenient it is.
No never, and I don't know them. But I can tell you that I've worked with real messages.
I've not take a look at your implementation yet Please could you tell me when the translation file is read? Is the file parsed only once and the translations stored on a cache?
The dictionary parsed and loaded during generation of the locale then it is stored in the memory and not changed till the std::locale object is destroyed.
2. Support of pural forms
Plural forms can be designed on top of the message facet?
No, New message facet required
You have added one, isn't it? If I'm not wrong gettext doesn't take care of plurals, and you have added something on top of.
It does.
See: http://linux.die.net/man/3/ngettext
It could be done without breaking binary messages format but it does not mean that it is not implemented by gettext.
Thanks for the pointer.
4. Using natural language identifiers as keys
I have some use cases needing a more compact format.
If you really want make your case "msg1234"...
I agree.
I'm not saying the standard can not be improved, but I think it would be better to build on top of it, instead of providing two interfaces that use incompatible catalogs. Making internationalizable applications that use C++ internationalizable libraries using different catalogs would be a complex for the translator.
Really?
Well maybe not too much, but you will need to use different tools, ....
The C++0x had deprecated std::auto_ptr that everybody uses and had given std::unique_ptr.
You are suggesting to enforce bad design to good facet just because it exists and nobody uses it?
I disagree. This std::messages facet should be deprecated or even removed.
No. I'm just telling that if you have valid arguments it will be better to deprecate one and add one that is better. But having two catalogs is not good. For example if I want to make Chrrno internationalizable I can just use Std facet message until there is a better facet.
I also think that if you find the message facet is not usable in real life, you should make a standard proposal to improve it (Why not for TR2?).
And I would suggest to deprecate std::message facet along with many other broken facets.
I'm sure you will have a lot of constructive feedback from some experts.
Current std::locale badly mimics POSIX/C locales infrastructure and it was good at that point but yet had included too many flaws from it and introduced even more flaws.
In order to make useful TR2 proposal you should do some groundbreaking and do things like:
1. Standardize locale names 2. Standardize messages catalogs formats 3. Rewrite some of existing facets completely 4. Deprecate some of the facets and functions.
The 3 and 4 are quite easy to do however the 1st and the 2nd would be very hard if possible at all.
So, are you saying that we can not have a other than implementation defined standard for localization?
Even the C++03/C++11 that fully mimics and copies POSIX message catalogs: catgets, catopen, catclose hadn't defined anything useful about them or referred to POSIX standards.
So... Yes, I'd like to see such things in TR2 but believe me message catalogs facet is the easiest things to rewrite, while the real localization problem lays far beyond them.
Well, having better facets could be one step ahead.
This what really concerns me in the standardization of localization facilities.
I really suggest you to participate on the standardization of a better locale library proposal, at the end this is also one of the goals of Boost. Best, Vicente

From: Vicente BOTET <vicente.botet@wanadoo.fr>
If "File was opened {1} day ago" is not in dictionary that it would be used as no Hebrew alternative provided, also it would have 2 plural forms (as English) instead of 3 (in Hebrew).
I insists, could you show the catalog associated to this translation in English and in Hebrew? I'm sure I'm missing something and I don't reach to see what.
The hebrew catalog looks like he.po # translation of foo.po to Hebrew "Project-Id-Version: foo\n" "PO-Revision-Date: 2008-06-07 15:04+0300\n" "Last-Translator: Artyom <artyomtnk@yahoo.com>\n" "Language-Team: Hebrew <en@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=3; plural= n==1 ? 0 : (n == 2 ? 1 : 2);\n" "X-Generator: KBabel 1.11.4\n" msgid "File was opened {1} day ago" msgid_plural "File was opened {1} days ago" msgstr[0] "Kovetz niftah lifney yom {1}" msgstr[1] "Kovetz niftah lifney yomaim" msgstr[2] "Kovetz niftah lifney {2} yamim" The English is the original string. In Japanese or Chinese (with one form) it would be ja.po ... "Plural-Forms: nplurals=1; plural=0;\n" ... msgid "File was opened {1} day ago" msgid_plural "File was opened {1} days ago" msgstr[0] "Some Japanese text {1} some Japanese text" ... The "Plural-Forms:" section of meta record describes the format of plural forms: their number and equation in C that calculates it from parameter, the facet that loads catalog parses the C equation and know how to calculate it.
format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds"))
The single problem I see which character use to split the string. Maybe % could be used
translate("%1 hours%,% %2 minutes%,% %3 seconds") % h % m % s
Not sure about it how exactly you want to do this? Would format(translate("Format date with H-M-S", "[{1} hour|{1} hours], [{2} minute|{2} minutes], [{3} second|{3} seconds]", h,m,s)) %h % m % s Is better? I don't know... Need to think about, because there may be more corner cases that I don't see yet.
The C++0x had deprecated std::auto_ptr that everybody uses and had given std::unique_ptr.
You are suggesting to enforce bad design to good facet just because it exists and nobody uses it?
I disagree. This std::messages facet should be deprecated or even removed.
No. I'm just telling that if you have valid arguments it will be better to deprecate one and add one that is better. But having two catalogs is not good.
Yes I recommend to deprecate.
For example if I want to make Chrrno internationalizable I can just use Std facet message until there is a better facet.
And it would work only on... Linux - gcc supports locales only on Linux - MSVC does not support messages at all. That is the sad reality.
In order to make useful TR2 proposal you should do some groundbreaking and do things like:
1. Standardize locale names 2. Standardize messages catalogs formats 3. Rewrite some of existing facets completely 4. Deprecate some of the facets and functions.
The 3 and 4 are quite easy to do however the 1st and the 2nd would be very hard if possible at all.
So, are you saying that we can not have a other than implementation defined standard for localization?
I'm just telling it would not be simple at all, especially that some ground breaking things like using UTF-8 by default, using specific messages catalog and so on. Without it locales would remain useless as they today.
Well, having better facets could be one step ahead.
But it would not be enough. See, if the state of the implementation (not interface) of the current facets was good it would be very-very-very useful part of C++ even in its limited way... But it isn't.
This what really concerns me in the standardization of localization facilities.
I really suggest you to participate on the standardization of a better locale library proposal, at the end this is also one of the goals of Boost.
Yes I understand and see and I think it may have a chance. Artyom

Message du 03/05/11 07:56 De : "Artyom" A : boost@lists.boost.org Copie à : Objet : Re: [boost] Boost.Locale and the standard "message" facet
From: Vicente BOTET
If "File was opened {1} day ago" is not in dictionary that it would be used as no Hebrew alternative provided, also it would have 2 plural forms (as English) instead of 3 (in Hebrew).
I insists, could you show the catalog associated to this translation in English and in Hebrew? I'm sure I'm missing something and I don't reach to see what.
The hebrew catalog looks like
he.po
# translation of foo.po to Hebrew "Project-Id-Version: foo\n" "PO-Revision-Date: 2008-06-07 15:04+0300\n" "Last-Translator: Artyom \n" "Language-Team: Hebrew \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=3; plural= n==1 ? 0 : (n == 2 ? 1 : 2);\n" "X-Generator: KBabel 1.11.4\n"
msgid "File was opened {1} day ago" msgid_plural "File was opened {1} days ago" msgstr[0] "Kovetz niftah lifney yom {1}" msgstr[1] "Kovetz niftah lifney yomaim" msgstr[2] "Kovetz niftah lifney {2} yamim"
The English is the original string.
In Japanese or Chinese (with one form) it would be
ja.po ... "Plural-Forms: nplurals=1; plural=0;\n" ... msgid "File was opened {1} day ago" msgid_plural "File was opened {1} days ago" msgstr[0] "Some Japanese text {1} some Japanese text" ...
The "Plural-Forms:" section of meta record describes the format of plural forms: their number and equation in C that calculates it from parameter, the facet that loads catalog parses the C equation and know how to calculate it.
Thanks. This clarify a lot of things to me. I don't know if you have some example of catalogs already in the documentation, but if not I'm sure that others will appreciate them. I don't know why I believed that the plural handling was managed by the library.
format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds"))
I guess the h, m, s are missing here and it seems to me that the use of parenthesis is needed. format(translate("Format date with H-M-S","{1}, {2}, {3}")) % (format(translate("Format date with H-M-S","{1} hour","{1} hours"), h) % h) % (format(translate("Format date with H-M-S","{1} minute","{1} minutes"), m) % m) % (format(translate("Format date with H-M-S","{1} second","{1} seconds"), s) % s) could you confirm?
The single problem I see which character use to split the string. Maybe % could be used
translate("%1 hours%,% %2 minutes%,% %3 seconds") % h % m % s
Not sure about it how exactly you want to do this? Would
format(translate("Format date with H-M-S", "[{1} hour|{1} hours], [{2} minute|{2} minutes], [{3} second|{3} seconds]", h,m,s)) %h % m % s
Is better? I don't know... Need to think about, because there may be more corner cases that I don't see yet.
I don't like the redundant parameters, one given to the translate function and then to format. A part from that point it merits some more deep analysis. Is this redundancy needed currently with only one parameter?
The C++0x had deprecated std::auto_ptr that everybody uses and had given std::unique_ptr.
You are suggesting to enforce bad design to good facet just because it exists and nobody uses it?
I disagree. This std::messages facet should be deprecated or even removed.
No. I'm just telling that if you have valid arguments it will be better to deprecate one and add one that is better. But having two catalogs is not good.
Yes I recommend to deprecate.
For example if I want to make Chrorno internationalizable I can just use Std facet message until there is a better facet.
And it would work only on... Linux
- gcc supports locales only on Linux - MSVC does not support messages at all.
That is the sad reality.
You have added a new facet on boost using strings, that you implement completely. The same can be done with integers and follow the standard as much as possible independently of whether std::message is supported on windows or not. But I'm not requesting you to do it. I suspect, that if I want to make Boost.Chrono internationalizable I have no choice. I will need to use your library until something standard is working on a portable way. Thanks for all the valuable informations. Best, Vicente

From: Vicente BOTET <vicente.botet@wanadoo.fr>
Objet : Re: [boost] Boost.Locale and the standard "message" facet
Subject: [boost] Boost.Locale and the standard "message" facet
Hi,
I was wondering how Boost.Locale is related to the standard message facet which is used to translate messages.
The standard message catalogs allow to extract messages by integer identifiers but may use string identifiers and it is implementation defined
It is undefined how to load message facets
Well implementation defined doesn't mean undefined.
Actually I want to talk about it a little bit more. std::message facet is designed after POSIX catopen and catgets See: http://pubs.opengroup.org/onlinepubs/009695399/functions/catgets.html http://pubs.opengroup.org/onlinepubs/009695399/utilities/gencat.html POSIX defines various rules on how to load message catalogs their format. However it is POSIX interface and as you probably understand Windows does not like such standards and according it MSDN std::message does nothing: http://msdn.microsoft.com/en-us/library/y2ka8972.aspx Quote:
Currently, while the messages class is implemented, there are no messages.
So as you can see, message facets not so useful as they may seem... Artyom
participants (2)
-
Artyom
-
Vicente BOTET