
Hi Mathias, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
Phil Endecott wrote:
UTF-16 .... This is the recommended encoding for dealing with Unicode.
Recommended by who? It's not the encoding that I would normally recommend.
The Unicode standard, in some technical notes: http://www.unicode.org/notes/tn12/ It recommends the use of UTF-16 for general purpose text processing.
It also states that UTF-8 is good for compatibility and data exchange, and UTF-32 uses just too much memory and is thus quite a waste.
From that document: Status This document is a Unicode Technical Note. It is supplied purely for informational purposes and publication does not imply any endorsement by the Unicode Consortium. .... Conclusion Unicode is the best way to process and store text. While there are several forms of Unicode that are suitable for processing, it is best to use the same form everywhere in a system, and to use UTF-16 in particular for two reasons: 1. The vast majority of characters (by frequency of use) are on the BMP. 2. For seamless integration with the majority of existing software with good Unicode support. I don't find either of those claims very convincing. I hope that your library will not try to make UTF-16 some sort of default encoding, or otherwise give it special treatment. Regards, Phil.