Re: [boost] RFC: interest in Unicode codecs?

14 Feb 2009


      On Saturday 14 February 2009 11:53:20 Graham wrote:
...
Using UTF-8 can work well if you are only targeting American and Western
Europe for non-literary use.
If you need to support the rest of the world you really need to move to
UTF-32 due to the large number of characters and the grapheme and glyph
handling [e.g. in Urdu you can type 3 characters and they are displayed
as a single combined glyph, and the cursor should never be placed
between them].
I think you have gotten something mixed up. UTF-8 and UTF-32 (aka UCS4) are 
just two encodings of the same character set, including the combining you 
mentioned (which are really not that uncommon, e.g. mêlée contains 2 
characters which could be written by combining glyphs. In practical terms, 
UTF-32 is somewhat useless. (A case might be made for UTF-16, though)

-- 
Kind regards, Esben

Re: [boost] RFC: interest in Unicode codecs?

Esben Mose Hansen