
----- Original Message ----
From: Jeremy Maitin-Shepard <jeremy@jeremyms.com> On 04/25/2011 11:56 PM, Artyom wrote:
From: Jeremy Maitin-Shepard<jeremy@jeremyms.com>
The most significant complaint seems to be the fact that the translation interface is limited to ASCII (or maybe UTF-8 is also supported, it isn't entirely clear).
[snip]
I imagine relative to the work required for the whole library, these changes would be quite trivial, and might very well transform the library from completely unacceptable to acceptable for a number of objectors on the list, while having essentially no impact on those that are happy to use the library as is.
I can say few words on what can be done and what will never be done.
I will never support wide, char16_t or char32_t strings as keys.
It seems that it is mostly possible to get the desired results using only char * strings as keys [snip]
However, I don't see why you are so opposed to providing additional overloads. With MSVC currently, only wide strings can represent the full range of Unicode. You could provide the definitions in an alternate static/dynamic library from the char * overloads, so that there would not even be any substantial space overhead.
How the catalog works. It searches the key in the hash table, as the last stage it compares the strings bytewise. It is fast and efficient. In order to support both L"", "", u"" and U"" I need to create a 4 variants of same string to make sure it works fast (waste of memory) or I need to convert the string from UTF-16/32 to UTF-8 that is run-time memory allocation and conversion. So no, I'm not going to do this, especially that it is nor portable enough.
One possibility is to provide per-domain basis a key in po file "X-Boost-Locale-Source-Encoding" so user would be able to specify in special record (which exists in all message catalogs) something like:
"X-Boost-Locale-Source-Encoding: windows-936" or "X-Boost-Locale-Source-Encoding: UTF-8"
Then when the catalog would be loaded its keys would be converted to the X-Boost-Locale-Source-Encoding.
This isn't a property of the message catalog, but rather a property of the program itself, and therefore should be specified in the program, and not in the message catalog, it would seem. Something like the preprocessor define I mentioned would be a way to do this.
Two problem with define that I want translate("foo") to work automatically and not being a define. So I either need to provide an encoding in catalog itself or when I provide domain name (the reason it is done per domain name as one part of the project may use UTF-8 and other cp936 and other may use US-ASCII at all) So I can either specify it when I load a catalog or in catalog itself.
wcout<< translate("「平和」"); // convert in runtime from cp939 to
UTF-16
cout<< translate("「平和」"); // convert in runtime from cp939 to UTF-8 [snip]
When you say "convert in runtime", it seems you actually mean the keys will be converted from UTF-8 to cp939 when the messages are loaded, but the values will remain UTF-8. Untranslated strings would have to be converted, I suppose.
Yes when catalog load the UTF-8 keys will be converted to cp936 for best performance but in runtime the original untranslated keys should be converted to target locale. Artyom