[boost] Re: Any interest in adding unicode support to boost?

20 Oct 2004

      Robert Ramey wrote:
...
Basically my reservations about the utility of a unicode library stem from
the following:
a) the standard library has std:::basic_string<T> where T is any type 
char,
wchar_t or whatever.
Yes. The problem with unicode is that it is not really possible to represent 
a character as an atomic value. A single glyph could in extreme cases be 
made up of 3 (or even more) 32 bit code units (UTF-32), and therefore 
defining a good T, is nigh on impossible.
...
b) all algorithms that use std::string are (or should be) applicable to
std::basic_string<T> regardless of the actual type of T (more or less)
c) character encodings can be classified into two types - single element
types like unicode (UCS-2, UCS-4) and ascii, and multi element types like
JIS, and others.
As i said, Unicode is not fixed width. Not in any encoding scheme. Therefore 
it is very difficult to teach the basic_string class to correctly handle 
unicode strings.
...
d) there exist ansi functions which translate strings from one type to an
other based on information in the current locale.  This information is
dependent on the particular encoding.
e) There is nothing particularly special about unicode in this scheme. 
Its
just one more encoding scheme among many.  Therefore making a special
unicode library would be unnecessarily specific.  Any efforts so spent 
would
be better invested in generic encoding/decoding algorithms and/or setting 
up
locale facts for specific encodings UTF-8, UTF-16, etc.
The reason for focusing on Unicode is that is has become the de facto 
standard for character representation. It is supported by most OSes and many 
programming languages. This is not likely to change.

As for other encoding schemes. I actually had support for other encodings 
(like UCS, Shift JIS etc.) in the back of my mind when I wrote the 
implementation I described earlier. That is why the string class is called 
encoded_string, and not unicode_string. If the interface of the 
encoding_traits class is made general enough, it should be a piece of cake 
to add support for additional encoding schemes at a later date.

[boost] Re: Any interest in adding unicode support to boost?

Erik Wien