Re: [boost] [General] Always treat std::strings as UTF-8? (was [Process] List of small issues)

13 Jan 2011

      On Thu, 13 Jan 2011 06:35:53 -0800 (PST)
Artyom <artyomtnk@yahoo.com> wrote:

[...]
...
Notes:
1. You can also always assume that strings under windows are UTF-8
     and always convert them to wide string before system calls.
This is I think better approach, but it is different from what
     most of boost does.
[...]
An interesting thought... I developed a set of ASCII/UTF-8/16/32
classes for my company not too long ago, and I became fairly familiar
with the UTF-8 encoding scheme. There was only one issue that stopped
me from assuming that all std::string types as UTF-8-encoded: what if
the string *isn't* meant as UTF-8 encoded, and contains characters with
the high-bit set?

There's nothing technically stopping that from happening, and there's
no way to determine with complete certainty whether even a string that
seems to be valid UTF-8 was intended that way, or whether the UTF-8-like
characters are really meant as their high-ASCII values.

Maybe you know something I don't, that would allow me to change it? I
hope so, it would simplify some of the code greatly. 
-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*