
Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> writes:
Jeremy Maitin-Shepard wrote:
It occurs to me that perhaps it is not unreasonable after all to restrict the library to supporting Unicode encodings for in-memory character representation.
I personally believe Unicode (not only the character set, but also its collations and algorithms) is the only viable way to represent characters, and thus should be the way strings work with. (get out evil locales and other stuff!) Of course, various encodings can still be used for serialization.
I agree that I personally would always want to use a Unicode encoding for handling text in my software. The question, though, is whether the new I/O library should actually force users to use a Unicode encoding for internal text representation. Even if other internal encodings are supported, Boost might still only provide actual text formatting facilities and other high-level text facilities for all Unicode encodings (UTF-8, UTF-16, and UTF-32) or even only a single Unicode encoding.
Unfortunately, C++ is quite far from having good Unicode tools (not that other programming languages are really better -- Unicode is simply quite complicated, because human languages just are)
ICU has most of the stuff, but not with the right interfaces.
A better I/O system might provide a very solid base on top of which proper higher level text facilities can be provided, quite possibly by incorporating pieces of ICU. -- Jeremy Maitin-Shepard