
On Wed, 19 Jan 2011 16:13:04 +0100, Matus Chochlik wrote:
I do not believe that UTF-8 is the way to go. In fact I know it is not, except perhaps for the very near future for some programmers ( Linux advocates ). :-) Just for the record, I'm not a Linux advocate any more then I'm a Windows advocate. I use both .. I'm writing this on a windows machine. What I would like is the whole encoding madness/dysfunction (including but not limited to the dual TCHAR/whateverchar-based interfaces) to stop. Everywhere. Even if I bought the UTF-8ed-Boost idea, what would we do about the STL implementation on Windows which expects local-codepage narrow strings? Are we hoping MS etc. change these to match? Because otherwise we'll be converting between narrow encodings for the rest of eternity. That's the reality already. As long as people use local narrow encodings we will be converting between them. If your code runs on Windows in Korea or in Spain, you'll get local-codepage narrow strings
On 01/19/2011 07:34 AM, Alexander Lamaison wrote: that are incompatible. At least if there was a utf-8_string type, or utf16_string type, or utf-32_string type, with documentation about how to implement templated conversions to them, (code conversion facets), someone could write a library to use them, and everyone using all of these different local encodings would know what to do to use the library. The way it is today it's much more difficult to figure out how to write a generic library that accepts text from a user. What's a char* or a std::string<char> imply about encoding? Who knows what you'll get. A local 8 bit code page? Shift-JIS? utf-8? euc? This is just saying that, hey, here's one way to deal with this issue. This sort of scheme lets the Windows STL implementation exist, but says, here's what you need to do so that I know how to treat the text you pass to me as an argument. If it's in a local code page you need to convert it to what I want. With validating string types that support the three UCS encodings you can trust that the data is validly encoded, although all the normal issues about whether the content is meaningful to you still exist. If you use normal code conversion facets as specified for C++ locales, for conversion from local code pages to your strings, then you can leverage existing work. Why reinvent the wheel? Patrick