
At 1:41 PM -0500 2/20/08, Frank Mori Hess wrote:
I don't have a lot of experience using non-ascii strings in my internal code, aside from occasional forays into utf-8 for special characters, but wouldn't using ucs-4 for the "core" encoding be the sane thing to do? With a ucs-4 encoding, you could use a
basic_string<wchar_t>
and continue using the familiar api without worrying about the complications and confusion caused by variable length encodings.
You are making an unwarranted assumption - that wchar_t is big enough to hold a ucs-4 code point (or, in fact, that wchar_t has a particular size). This is incorrect. On some compilers, sizeof(wchar_t) == 2, while on others, sizeof(wchar_t) == 4. (Other compilers may use other values as well - but I've never seen them). -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.