
Robert Ramey wrote:
Basically my reservations about the utility of a unicode library stem from the following:
a) the standard library has std:::basic_string<T> where T is any type char, wchar_t or whatever.
Yes. The problem with unicode is that it is not really possible to represent a character as an atomic value. A single glyph could in extreme cases be made up of 3 (or even more) 32 bit code units (UTF-32), and therefore defining a good T, is nigh on impossible.
b) all algorithms that use std::string are (or should be) applicable to std::basic_string<T> regardless of the actual type of T (more or less) c) character encodings can be classified into two types - single element types like unicode (UCS-2, UCS-4) and ascii, and multi element types like JIS, and others.
As i said, Unicode is not fixed width. Not in any encoding scheme. Therefore it is very difficult to teach the basic_string class to correctly handle unicode strings.
d) there exist ansi functions which translate strings from one type to an other based on information in the current locale. This information is dependent on the particular encoding. e) There is nothing particularly special about unicode in this scheme. Its just one more encoding scheme among many. Therefore making a special unicode library would be unnecessarily specific. Any efforts so spent would be better invested in generic encoding/decoding algorithms and/or setting up locale facts for specific encodings UTF-8, UTF-16, etc.
The reason for focusing on Unicode is that is has become the de facto standard for character representation. It is supported by most OSes and many programming languages. This is not likely to change. As for other encoding schemes. I actually had support for other encodings (like UCS, Shift JIS etc.) in the back of my mind when I wrote the implementation I described earlier. That is why the string class is called encoded_string, and not unicode_string. If the interface of the encoding_traits class is made general enough, it should be a piece of cake to add support for additional encoding schemes at a later date.