
Mathias Gaunard wrote:
Jeremy Maitin-Shepard wrote:
It occurs to me that perhaps it is not unreasonable after all to restrict the library to supporting Unicode encodings for in-memory character representation.
I personally believe Unicode (not only the character set, but also its collations and algorithms) is the only viable way to represent characters, and thus should be the way strings work with. (get out evil locales and other stuff!) Of course, various encodings can still be used for serialization.
I'd like to note that Unicode consumes more memory than narrow encodings. This may not be desirable in all cases, especially when the application is not intended to support multiple languages in its majority of strings (which, in fact, is a quite common case).