
Peter Dimov wrote:
Alexander Lamaison wrote:
I was under the impression that Linux changed from interpreting char* as being in a multitude of different encodings to being in UTF-8 by default.
Well, it probably depends on what part of Linux we're talking to, but most of the functions do not interpret char* as being in any encoding, neither do they have a default. They just treat it as a byte sequence.
hmmm - that's what I always considered std::string to be. There's no notion of locale in there. I'm still not seeing why we can't continue to consider std::string just a sequence of bytes with some extra sauce .. ... and make a new class utf8_string .. derived from which which includes a code point iterator, a function to return a utf8 "character or codepoint or whatever it is". I just can't see anything wrong with this. It doesn't redefine the sematics (formal, intuitive, common usage) of std::string, utf8_string would let one use the special unicode sauces when needed. And it could be implicitly converted to std::string when passed as a function argument. Finally, given the history of this, I don't believe utf8 is the "end of the road". It still leaves open the possibility of the next greatest thing - whatever that turns out to be. To summarize: std::string - a sequence of bytes utf8_string - a sequence of "code points" implemented in terms of std::string. (or at least convertible to std::string) Robert Ramey