Boost.Locale and the standard "message" facet

newer
[foreach] Warning with gcc-4.6.0...

older
[config] Macro BOOST_NO_NOEXCEPT...

Vicente BOTET

30 Apr 2011 30 Apr '11

10:51 a.m.

Hi, I was wondering how Boost.Locale is related to the standard message facet which is used to translate messages. Note that the facet interface work with integer identifiers avoiding all the issues raised by the get_text/translate functions provided by Boost.Locale. Is there any reason Boost.Locale could not follow the standard design? What are the advantages of the Boost.Locale design? Best, Vicente

Show replies by date

Artyom

30 Apr 30 Apr

1:46 p.m.

...

Subject: [boost] Boost.Locale and the standard "message" facet

Hi,

I was wondering how Boost.Locale is related to the standard message facet which is used to translate messages.

The standard message catalogs allow to extract messages by integer identifiers but may use string identifiers and it is implementation defined It is undefined how to load message facets or format them and so on. It does not support plural forms and context. It is the most unless facet around.

...

Note that the facet interface work with integer identifiers avoiding all the issues raised by the get_text/translate functions provided by Boost.Locale.

Use of integer identifiers is the best way to screw the localization in the software. What does 3456 means? Do you really think it is good to write translate(MY_MESSAGE_OPENING_FILE) No, never - never - never - never - never use such "constant" or "integer" identifiers. Always use natural text.

...

Is there any reason Boost.Locale could not follow the standard design?

The standard message catalogs to weak

...

What are the advantages of the Boost.Locale design?

1. Defined way to load and format catalogs 2. Support of pural forms 3. Support of message-context 4. Using natural language identifiers as keys 5. Convenense interface More?

...

Best, Vicente

Regards, Artyom

Steve Bush

1 May 1 May

12:54 p.m.

...

Use of integer identifiers is the best way to screw the localization in the software.

...

What does 3456 means? Do you really think it is good to write translate(MY_MESSAGE_OPENING_FILE)

...

No, never - never - never - never - never use such "constant" or "integer" identifiers.

...

Always use natural text.

I am not sure I fully understand this, but definitely I disagree with the idea that messages should never be integer identifiers. There is a principle in the database world that primary keys to records should be meaningless where the meaning can change over time. Imagine a text identifier "Close your hatch now" and over time, the very concept of "hatch" becomes meaningless - yet for all time the source code is condemned to have the original, now meaningless if not downright confusing identifier "Close your hatch now". In the windows world messages are generally numbers and there is considerable value in being able to search the web/documentation for error -23756472536476523 instead of some local language string which will only turn up comments in one language. Any text compiled into a program is essentially a constant exactly like any integer. Therefore the same rules apply equally to Generally I think the idea of compiling any meaningful text whatsoever into object code is questionable from a theoretical basis and is usually just a hangover from when gettext/translate was a quick and dirty way to largely automate the localisation of existing mono-lingual programs - by simply wrapping all quoted text with a call to translate.

Artyom

6:21 p.m.

...

From: Steve Bush <sb2@neosys.com>

...
Use of integer identifiers is the best way to screw the localization in the software.

...
What does 3456 means? Do you really think it is good to write translate(MY_MESSAGE_OPENING_FILE)

...
No, never - never - never - never - never use such "constant" or "integer" identifiers.

...
Always use natural text.

I am not sure I fully understand this, but definitely I disagree with the idea that messages should never be integer identifiers.

...

There is a principle in the database world that primary keys to records should be meaningless where the meaning can change over time. Imagine a text identifier "Close your hatch now" and over time, the very concept of "hatch" becomes meaningless - yet for all time the source code is condemned to have the original, now meaningless if not downright confusing identifier "Close your hatch now".

See notes below.

...

In the windows world messages are generally numbers and there is considerable value in being able to search the web/documentation for error -23756472536476523 instead of some local language string which will only turn up comments in one language.

Note, same for things like errno and strerror - the error is represented by the code and strerror (usually) converts it to the natural text... However it is not always correct as strerror may add more information to text then just simple key-value lookup. So even in case of error codes the error code is useful for representing a condition for the program while the text itself may be generated in different way. So basically it should be: switch(error_code) { case EINVAL: if(...) return gettext("first parameter is null") else if (...) return gettext("The range is invalid"); } Same works for many other APIs. Consider int status = somesql_prepare_query(conn,"SELECT * FROM fooo"); Now even if the status is SOMESQL_PREPARATION_FAILED the message somesql_strerror(conn) May actually return: "Unknown table `fooo'" The fact that it is used in Windows API and actually it is used by some (legacy) localization systems does not mean that this is the way it should be. See description below.

...

Any text compiled into a program is essentially a constant exactly like any integer. Therefore the same rules apply equally to

Not every rule that is applicable to the software applicable for human interface and Natural-Languages.

...

Generally I think the idea of compiling any meaningful text whatsoever into object code is questionable from a theoretical basis and is usually just a hangover from when gettext/translate was a quick and dirty way to largely automate the localisation of existing mono-lingual programs - by simply wrapping all quoted text with a call to translate.

No, it is not. Having natural language identifier has following important advantages: 1. It is promised that the meaning of the text and the translation is always synchronized. 2. It makes code much more readable 3. It makes code much more maintainable 4. It makes it easier to detach actual translator from the source code. All modern localization system provide natural language identifiers. And "constants" should never be used for message formatting. And this is not only my opinion by also the opinion of many people who actually deal with localization. Compare the code: source.cpp MessageBox(translate(MSG_OPEN_FILE_TITLE),translate(MSG_OPEN_IMAGE_FILE_WEB)); resource.h #define MSG_OPEN_FILE_TITLE 1 #define MSG_OPEN_IMAGE_FILE_WEB 2 English.txt 1 "Open File" 2 "Open the file with the image to Upload to the web site" Hebrew.txt 1 "פתח קובץ‎" 2 "פתח קובץ שיועלה לאתר ברשת‎" With the code: source.cpp MessageBox(translate("File Dialog","Open File"), translate("Open the file with the image to Upload to the web site")); he.po msgctx "File Dialog" msgid "Open File" msgstr "פתח קובץ" msgid "Open the file with the image to Upload to the web site" msgstr "פתח קובץ שיועלה לאתר ברשת" Now I hope it is clear now? A constant keys just create additional indirection level. So, Never-Never-Never-Never-Never use artificial keys unless you want to make really bad software and make your software and translation teams miserable This is not theoretical question about some general databases foregin keys, it is very progmatic question about how to make the localization right. And yes in early age of software localization the integer keys could seen as good method, but nobody works this way today. Artyom Beilis

Sebastian Redl

6:28 p.m.

On 01.05.2011, at 20:21, Artyom wrote:

...

All modern localization system provide natural language identifiers. And "constants" should never be used for message formatting.

Define modern. Mozilla's localization uses string keys, but they are certainly not natural language. Sebastian

Steve Bush

5 May 5 May

9:40 a.m.

...

Now I hope it is clear now? A constant keys just create additional indirection level.

Am well aware how it works but the problem is that you don’t seem to be aware of any limitations in the process. 1. You are hard coding stuff which can henceforth NEVER be changed (since it is the primary key into the translation data...base) despite the fact that it SOMETIMES it benefits considerably from being changed - for clarity. 2. Your binary is bloated with text These are NOT the only limitations but they are perhaps the most clear. I think it is quite ironic that you hate hard coding meaningless keys (eg integers) for messages, despite the fact that they are meaningless numbers that NEVER change, but are quite happy to hard code text strings which SOMETIME benefits from being changed. The appeal of the gettext is its self-documentation BUT this has to be balanced versus problems 1 and 2 above and I think it is abundantly clear that gettext/translate using long text keys is NOT ALWAYS the correct solution.

...

So, Never-Never-Never-Never-Never use artificial keys unless you want to make really bad software and make your software and translation teams miserable

This reminds me of "irrational exhuberance"

...

This is not theoretical question about some general databases foregin keys, it is very progmatic question about how to make the localization right.

It isn't a theoretical issue at all since the translation data is a database and the message id is a primary key. The principle of DON’T USE MEANINGFUL PRIMARY KEYS IF THEIR MEANING MAY CHANGE is definitely applicable here. The gettext/translate method is condemned to stick for ALL TIME with whatever stupid text the programmer considered was worthy at the initial coding! It is notorious that concepts change over time and therefore HARD CODING TRANSLATION KEYS IS NOT A PERFECT SOLUTION IN ALL CASES. The above problems are not an issue in many cases. 1. Some people don’t care that their translation keys CANNOT CHANGE EVER. 2. Some people don't care that their binaries are bloated with text For those people gettext/translate is GREAT! Just appreciate that there ARE some difficulties with HARD CODING CONSTANT STRINGS in at least some cases and recognise that every solution has its compromises.

...

And yes in early age of software localization the integer keys could seen as good method, but nobody works this way today.

I dunno Artyom, localisation has a very long history. It isn't something new you know. Meaningless keys for translation are the majority solution in Windows world and are not on the way out at all. You should know that gettext was a quick and dirty hack to start with. All we had to do was wrap all or most strings with a function call and, at first run time, the program wrote out any newly discovered "localised" strings to a database and b) converted them wherever a translation could be found. AMAZING! But it DOES have its limitations!

...

Artyom Beilis _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Artyom

1:24 p.m.

...

...
Now I hope it is clear now? A constant keys

...

...
just create additional indirection level.

Am well aware how it works but the problem is that you don’t seem to be aware of any limitations in the process.

1. You are hard coding stuff which can henceforth NEVER be changed (since it is the primary key into the translation data...base) despite the fact that it SOMETIMES it benefits considerably from being changed - for clarity.

First of all if the original string is changed then you also must revisit the translation string, in this case the "key" = "original" text is very important. And if the change is relatively small it would be marked as fuzzy and translator would be able to see if he needs to keep current translation or update it.

...

2. Your binary is bloated with text

Really? Have you ever did any measurements? For example gtranslator - a GUI program written in C that is used for work with po/mo gettext files. Symbols in ELF: 36577 bytes Translation keys: 10324 bytes! And this is C! Now support or RTTI and class names of all templates, all symbols are very small not like the symbols of STL that can take several K of text to mangle them and so on. Texts are small. Really, there is no bloat in putting texts into executables.

...

[snip] I dunno Artyom, localisation has a very long history. It isn't something new you know. Meaningless keys for translation are the majority solution in Windows world and are not on the way out at all.

Nobody uses today integers, even Java and .Net use some textual identifiers and never numbers. It just does not work well. And of course Boost.Locale allows you to use arbitrary text as key... You can write translate("menu.file.title") without problems if you really want to... But it is not recommended.

...

[snip]

You should know that gettext was a quick and dirty hack to start with.

It is not "hack", because other localization systems like Qt uses same concept, and they know a things or two in UI development: http://doc.qt.nokia.com/4.6/internationalization.html Please, before anybody suggests once again about using "integers" as keys... Take a look on the current localization world... LoadString and its POSIX friend catgets are in the past. Best, Artyom

5197

Age (days ago)

5202

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Artyom
Sebastian Redl
Steve Bush
Vicente BOTET