
On Tue, Apr 19, 2011 at 11:31 PM, Soares Chen Ruo Fei <crf@hypershell.org> wrote:
Ryou Ezoe wrote:
What I want is translate() accept wchar_t const * and std::wstring as a parameter. just like it accept char const * and std::string. Then, it return the corresponding translated text. Although the encoding of wchar_t is unspecified in the Standard. In the current MS-Windows environment, it should be treated as UTF-16.
Converting it to UTF-8 is a implementation details. I don't care which UTF it internally use. As long as it support real UCS(all code points defined in UCS)
But treating it as UCS rather than binary string is better.
Assuming we have C++0x compiler and encoding of wchar_t is UTF-16, translate(u8"text"), translate(u"text"), translate(U"text") and translate(L"text") all returns the same mapped translated text according to the locale. This is a good.
I suppose that you are probably fine with the requirement that the supplied text must be in one of the Unicode encodings, because otherwise translating from text in shift-JIS or arbitrary encodings is probably be a mess from a technical perspective.
I think that what we really need is to enforce the character set used in Boost.Locale, not the language. It just happen that Artyom chose the ASCII character set which don't support most other languages. I don't see any technical reasons to enforce the language used for translating, but there are many technical reasons to enforce a particular encoding. We can just change the encoding used from ASCII to UCS, and that wouldn't technically make much difference. The only problem for using Unicode as the translation key is the normalization issues. Since normalization is too heavyweight, the translation system should probably operate at code point level, though translations of identical original text with different code points will then fail.
I don't expect perfect normalization. I think it's not possible. I just want libraries to be UCS aware.
I have one suggestion to overcome GNU Gettext's limitation. Perhaps we can automatically convert the text into Unicode escaped sequences before passing to GNU Gettext, so "日本語" in UTF-8 will become "\\u65E5\\u672C\\u8A9E" in ASCII.
Why do you need to escaped it? Why do you want to stick with ASCII? UCS and its encoding UTF-8, UTF-16, UTF-32 will be specified in upcoming C++0x standard. On the other hand, standard still does not say ASCII. The basic source character set does not cover all ASCII characters. So using ASCII is not portable.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Ryou Ezoe