Re: [boost] Idea to support unicode strings

30 Sep 2006


      The central difference between ansi strings and a utf8 strings is, that
character access by index is simple for ansi strings but difficult for utf8
encoded strings. std::basic_string can handle utf8, utf16 and utf32 encoded
strings, but there is no access to the decoded string with access to the unicode
values of the characters.
...
However it isn't basic_string and it means it is isolated from the rest of 
standard library. In perfect world I would expect to read/write utf_strings 
from std::streams in the same way it is provided for std::string i.e. all the
operations like operator>>, getline and so on should be usable on 
utf_strings.
It is always possible to access the basic_string<> data by calling raw()! The
standard requires character access, which can't be implemented efficiently for
utf8 and utf16 encoded strings.
...
So in this area I basicaly identify with Matt Austern's proposal for the 
C++0x ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2035.pdf ).
I see my approach as an addition to Matt Austern's proposal. While Matt is
handling encoded strings, my approach deals with decoded strings. The encoded 
string types are std::string, std::ustring, and std::u32string. These strings 
allow access to the raw values of the encoded words. My wrapper allow access to 
the strings at an symbolic level. It allows conversion between the different 
encodings and also to the unicode values of the characters as char32_t values.
8bit word array -> std::string
16bit word array -> std::ustring
32bit word array -> std::u32string

utf8 encoded strings  ->  utf8_string (based on std::string)
utf16 encoded strings  ->  utf16_string (based on std::ustring)
utf32 encoded strings  ->  utf32_string (based on std::u32string)

but the approach also allows:
latin-1 encoded strings  ->  latin1_string (based on std::string)
windows-1252 encoded strings  ->  windows_1252_string (based on std::string)

Regards,
Nils

Re: [boost] Idea to support unicode strings

Nils Springob