Re: [boost] [General] Always treat std::strings as UTF-8

16 Jan 2011

      On 15/01/2011 15:46, Artyom wrote:
...
No you don't need convert UTF-8 to "locales" encoding as char* is native
system API unlike Windows one. So you don't need to mess around with encodings
at all unless you deal with text related stuff like for example collation.
POSIX system calls expect the text they receive as char* to be encoded 
in the current character locale.

To write cross-platform code, you need to convert your UTF-8 input to 
the locale encoding when calling system calls, and convert text you 
receive from those system calls from the locale encoding to UTF-8.
(Note: this is exactly what gtkmm::ustring does)

Windows is exactly the same, except it's got two sets of locales and two 
sets of system calls.

The wide character locale is more interesting since it is always UTF-16, 
so the conversion you have to do is only between UTF-8 and UTF-16, which 
is easy and lossless.

Likewise, you could also choose to use UTF-16 or UTF-32 as your internal 
representation rather than UTF-8. The choice is completely irrelevant 
which regards to providing an uniformly encoded interface regardless of 
platform.
...
The problem is not locales, encodings or other stuff, the problem
is that Windows API does not allow you to use "char *" based
string fully as it does not support UTF-8
The actual locale used by the user is irrelevant.

Again, as I said earlier, the fact that UTF-8 is the most common locale 
on Linux but is not available on Windows shouldn't affect the way the 
system works.

A lot of Linux systems use a Latin-1 locale, and your approach will 
simply fail on those systems.
...
and platform independent
programming becomes total mess.
So your technique for writing independent code is relying on the user to 
use an UTF-8 locale?

Re: [boost] [General] Always treat std::strings as UTF-8

Mathias Gaunard