Re: [boost] GSoC Unicode library: second preview

20 Jun 2009

      2009/6/20 Artyom <artyomtnk@yahoo.com>
...
...
UTF-16 ... This is the recommended encoding for dealing with
Unicode internally for general purposes
To be honest, it is most error prone encoding to work with Unicode:
Amen.

Really, I don't see why people don't just use UTF-8 all over the
place.  Even UTF-32 isn't as convenient as most would like, since you
still have combining code points and other similar complications.

As a programmer what I really care about is usually some nebulous
concept of "characters", and one character can easily be 3 codepoints
or 1/3 of a codepoint.

It feels like the only way to get Unicode string handling right (at
the application level, not library or render levels) is to deal
entirely in strings and regexes.

Suppose I have "difficult" with the "ffi" ligature codepoint, and I do
a perl-style split on /i/.  I should probably be getting "d", the "ff"
ligature codepoint, and "cult".  I know if I tried to code that by
hand in every application I'd miss all kinds of evil corner cases like
that.

Re: [boost] GSoC Unicode library: second preview

Scott McMurray