Re: [boost] [General] Always treat std::strings as UTF-8

14 Jan 2011


      ...
From: John B. Turpish
On Fri, Jan 14, 2011 at 4:42 AM, Matus Chochlik <chochlik@gmail.com> wrote:
...
b) UTF-32 is basically a waste of memory for most localizations.
I'm not an expert, so take this with a grain of salt. But couldn't it
just as easily be said that UTF-8 is a waste of CPU? There are a
number of operations that are constant time if you can assume a fixed
size for a character that I would think would have to be linear for
UTF-8, for example accessing the Nth character.
IIUC you can't assume a fixed size for a character even with UTF-32. In UTF-32 only _codepoints_ have fixed size, yet one character
may be composed of several codepoints, e.g. a latin letter followed by a diacritical mark, making up one character
(http://en.wikipedia.org/wiki/Combining_character).

Best regards,
Robert

Re: [boost] [General] Always treat std::strings as UTF-8

Robert Kawulak