Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

20 Jan 2011

      ...
On Wed, 19 Jan 2011 16:13:04 +0100, Matus Chochlik wrote:
...
...
I do not believe that UTF-8 is the way to go. In fact I know it is not,
except perhaps for the very near future for some programmers ( Linux
advocates ).
:-) Just for the record, I'm not a Linux advocate any more then I'm
a Windows advocate. I use both .. I'm writing this on a windows machine.
What I would like is the whole encoding madness/dysfunction (including
but not limited to the dual TCHAR/whateverchar-based interfaces) to stop.
Everywhere.
Even if I bought the UTF-8ed-Boost idea, what would we do about the STL
implementation on Windows which expects local-codepage narrow strings?  Are
we hoping MS etc. change these to match?  Because otherwise we'll be
converting between narrow encodings for the rest of eternity.
That's the reality already.  As long as people use local narrow 
encodings we will be converting between them.  If your code runs on 
Windows in Korea or in Spain, you'll get local-codepage narrow strings
On 01/19/2011 07:34 AM, Alexander Lamaison wrote:
that are incompatible. At least if there was a utf-8_string type, or 
utf16_string type, or utf-32_string type, with documentation about how 
to implement templated conversions to them, (code conversion facets), 
someone could write a library to use them, and everyone using all of 
these different local encodings would know what to do to use the 
library.  The way it is today it's much more difficult to figure out how 
to write a generic library that accepts text from a user.  What's a 
char* or a std::string<char> imply about encoding?  Who knows what 
you'll get.  A local 8 bit code page?  Shift-JIS?  utf-8? euc? This is 
just saying that, hey, here's one way to deal with this issue.

This sort of scheme lets the Windows STL implementation exist, but says, 
here's what you need to do so that I know how to treat the text you pass 
to me as an argument.  If it's in a local code page you need to convert 
it to what I want.  With validating string types that support the three 
UCS encodings you can trust that the data is validly encoded, although 
all the normal issues about whether the content is meaningful to you 
still exist.

If you use normal code conversion facets as specified for C++ locales, 
for conversion from local code pages to your strings, then you can 
leverage existing work.  Why reinvent the wheel?

Patrick

Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

Patrick Horgan