
Ryou Ezoe wrote:
What I want is translate() accept wchar_t const * and std::wstring as a parameter. just like it accept char const * and std::string. Then, it return the corresponding translated text. Although the encoding of wchar_t is unspecified in the Standard. In the current MS-Windows environment, it should be treated as UTF-16.
Converting it to UTF-8 is a implementation details. I don't care which UTF it internally use. As long as it support real UCS(all code points defined in UCS)
But treating it as UCS rather than binary string is better.
Assuming we have C++0x compiler and encoding of wchar_t is UTF-16, translate(u8"text"), translate(u"text"), translate(U"text") and translate(L"text") all returns the same mapped translated text according to the locale. This is a good.
I suppose that you are probably fine with the requirement that the supplied text must be in one of the Unicode encodings, because otherwise translating from text in shift-JIS or arbitrary encodings is probably be a mess from a technical perspective. I think that what we really need is to enforce the character set used in Boost.Locale, not the language. It just happen that Artyom chose the ASCII character set which don't support most other languages. I don't see any technical reasons to enforce the language used for translating, but there are many technical reasons to enforce a particular encoding. We can just change the encoding used from ASCII to UCS, and that wouldn't technically make much difference. The only problem for using Unicode as the translation key is the normalization issues. Since normalization is too heavyweight, the translation system should probably operate at code point level, though translations of identical original text with different code points will then fail. I have one suggestion to overcome GNU Gettext's limitation. Perhaps we can automatically convert the text into Unicode escaped sequences before passing to GNU Gettext, so "日本語" in UTF-8 will become "\\u65E5\\u672C\\u8A9E" in ASCII.