Re: [boost] [general] What will string handling in C++ looklikeinthe future [was Always treat ... ]

19 Jan 2011

      Peter Dimov wrote:
...
Alexander Lamaison wrote:
...
I was under the impression that Linux changed from interpreting
char* as being in a multitude of different encodings to being in
UTF-8 by default.
Well, it probably depends on what part of Linux we're talking to, but
most of the functions do not interpret char* as being in any encoding,
neither do they have a default. They just treat it as a byte sequence.
hmmm - that's what I always considered std::string to be.  There's
no notion of locale in there.

I'm still not seeing why we can't continue to consider std::string
just a sequence of bytes with some extra sauce ..

... and make a new class utf8_string .. derived from which which includes
a code point iterator, a function to return a utf8 "character or codepoint
or whatever it is".

I just can't see anything wrong with this. It doesn't redefine the
sematics (formal, intuitive, common usage) of std::string, utf8_string would
let one use the special unicode sauces when needed.  And it could
be implicitly converted to std::string when passed as a function
argument.  Finally, given the history of this, I don't believe utf8 is the
"end of the road".  It still leaves open the possibility of the next
greatest thing - whatever that turns out to be.  To summarize:

std::string - a sequence of bytes
utf8_string - a sequence of "code points" implemented in terms of 
std::string.
(or at least convertible to std::string)

Robert Ramey