
On 15/01/2011 15:46, Artyom wrote:
No you don't need convert UTF-8 to "locales" encoding as char* is native system API unlike Windows one. So you don't need to mess around with encodings at all unless you deal with text related stuff like for example collation.
POSIX system calls expect the text they receive as char* to be encoded in the current character locale. To write cross-platform code, you need to convert your UTF-8 input to the locale encoding when calling system calls, and convert text you receive from those system calls from the locale encoding to UTF-8. (Note: this is exactly what gtkmm::ustring does) Windows is exactly the same, except it's got two sets of locales and two sets of system calls. The wide character locale is more interesting since it is always UTF-16, so the conversion you have to do is only between UTF-8 and UTF-16, which is easy and lossless. Likewise, you could also choose to use UTF-16 or UTF-32 as your internal representation rather than UTF-8. The choice is completely irrelevant which regards to providing an uniformly encoded interface regardless of platform.
The problem is not locales, encodings or other stuff, the problem is that Windows API does not allow you to use "char *" based string fully as it does not support UTF-8
The actual locale used by the user is irrelevant. Again, as I said earlier, the fact that UTF-8 is the most common locale on Linux but is not available on Windows shouldn't affect the way the system works. A lot of Linux systems use a Latin-1 locale, and your approach will simply fail on those systems.
and platform independent programming becomes total mess.
So your technique for writing independent code is relying on the user to use an UTF-8 locale?