
On Saturday 14 February 2009 11:53:20 Graham wrote:
Using UTF-8 can work well if you are only targeting American and Western Europe for non-literary use.
If you need to support the rest of the world you really need to move to UTF-32 due to the large number of characters and the grapheme and glyph handling [e.g. in Urdu you can type 3 characters and they are displayed as a single combined glyph, and the cursor should never be placed between them].
I think you have gotten something mixed up. UTF-8 and UTF-32 (aka UCS4) are just two encodings of the same character set, including the combining you mentioned (which are really not that uncommon, e.g. mêlée contains 2 characters which could be written by combining glyphs. In practical terms, UTF-32 is somewhat useless. (A case might be made for UTF-16, though) -- Kind regards, Esben