New subject: UTF-8 conversion etc. (Cory Nelson)

10 Mar 2008

      ...
...
Sebastian,
...
...
...
...
...
...
...
...
As Unicode characters that are not in page zero can require more
than 32
...
...
bits
...
...
...
...
to encode them [yes really] this means that one 'character' can be
very
...
...
long
...

...
Unicode defines codepoints from 0 to 10FFFF - this can be encoded with
...
32 bits in UTF-8 and UTF-16.
Cory,

This is true for simple characters, except that current Unicode specs
require support for surrogates - which require twice that -and thats
even before you start to discuss logical grouping of characters or
graphemes which can themselves be two or three characters long.

I am glad you recognise that normalisation support is difficult - that's
why we the character support library is the hard part to develop. I
guess we just ran out of steam after that.

Yours,

Graham

Re: [boost] UTF-8 conversion etc. (Cory Nelson)

Graham

Cory Nelson

tags

participants (2)