Re: [boost] UTF-8 conversion etc.

20 Feb 2008

      At 1:41 PM -0500 2/20/08, Frank Mori Hess wrote:
...
I don't have a lot of experience using non-ascii strings in my internal code,
aside from occasional forays into utf-8 for special characters, but wouldn't
using ucs-4 for the "core" encoding be the sane thing to do?  With a ucs-4
encoding, you could use a
basic_string<wchar_t>
and continue using the familiar api without worrying about the complications
and confusion caused by variable length encodings.
You are making an unwarranted assumption - that wchar_t is big enough 
to hold a ucs-4 code point (or, in fact, that wchar_t has a 
particular size).

This is incorrect. On some compilers, sizeof(wchar_t) == 2, while on 
others, sizeof(wchar_t) == 4. (Other compilers may use other values 
as well - but I've never seen them).
-- 
-- Marshall

Marshall Clow     Idio Software   <mailto:marshall@idio.com>

It is by caffeine alone I set my mind in motion.
It is by the beans of Java that thoughts acquire speed,
the hands acquire shaking, the shaking becomes a warning.
It is by caffeine alone I set my mind in motion.