Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED

19 Apr 2011


      On Tue, Apr 19, 2011 at 11:31 PM, Soares Chen Ruo Fei
<crf@hypershell.org> wrote:
...
Ryou Ezoe wrote:
...
What I want is translate() accept wchar_t const * and std::wstring as
a parameter. just like it accept char const * and std::string.
Then, it return the corresponding translated text.
Although the encoding of wchar_t is unspecified in the Standard.
In the current MS-Windows environment, it should be treated as UTF-16.
Converting it to UTF-8 is a implementation details.
I don't care which UTF it internally use.
As long as it support real UCS(all code points defined in UCS)
But treating it as UCS rather than binary string is better.
Assuming we have C++0x compiler and encoding of wchar_t is UTF-16,
translate(u8"text"), translate(u"text"), translate(U"text") and
translate(L"text")
all returns the same mapped translated text according to the locale.
This is a good.
I suppose that you are probably fine with the requirement that the
supplied text must be in one of the Unicode encodings, because
otherwise translating from text in shift-JIS or arbitrary encodings is
probably be a mess from a technical perspective.
I think that what we really need is to enforce the character set used
in Boost.Locale, not the language. It just happen that Artyom chose
the ASCII character set which don't support most other languages. I
don't see any technical reasons to enforce the language used for
translating, but there are many technical reasons to enforce a
particular encoding. We can just change the encoding used from ASCII
to UCS, and that wouldn't technically make much difference. The only
problem for using Unicode as the translation key is the normalization
issues. Since normalization is too heavyweight, the translation system
should probably operate at code point level, though translations of
identical original text with different code points will then fail.
I don't expect perfect normalization.
I think it's not possible.
I just want libraries to be UCS aware.
...
I have one suggestion to overcome GNU Gettext's limitation. Perhaps we
can automatically convert the text into Unicode escaped sequences
before passing to GNU Gettext, so "日本語" in UTF-8 will become
"\\u65E5\\u672C\\u8A9E" in ASCII.
Why do you need to escaped it?
Why do you want to stick with ASCII?

UCS and its encoding UTF-8, UTF-16, UTF-32 will be specified in
upcoming C++0x standard.
On the other hand, standard still does not say ASCII.
The basic source character set does not cover all ASCII characters.
So using ASCII is not portable.
...
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Ryou Ezoe