
Esben,
I think you have gotten something mixed up. UTF-8 and UTF-32 (aka UCS4) are just two encodings of the same character set, including the combining you mentioned (which are really not that uncommon, e.g. m?l?e contains 2 characters which could be written by combining glyphs. In practical terms, UTF-32 is somewhat useless. (A case might be made for UTF-16, though) Kind regards, Esben
Having written both basic text editors and Unicode text editors, I can say that if you are going Western Hemisphere then may be more efficient to go UTF-8. If you stick to Unicode Code Page 0 then UTF-16 might be appropriate if you have no formatting bits, but by the time you want to do a full Unicode text editor you end up with [from memory] 21 or 22 bits of the UTF-32 encoding, and the remaining bits for your own formatting info if you need it [font/ colour etc]. With surrogates, you are still [very] slightly encoded in a 32 bit width, but this is a very acceptable trade off for simplicity. In that sense UTF-32 is a misnomer as it does not occupy a full 32 bits, but it is still an encoding ! Yours, Graham