
Zach Laine wrote:
I would love to see a Unicode support library added to Boost. However, I question the usefulness of another string class, or in this case another hierarchy of string classes. Interoperability with std::string (and QString, and CString, and a thousand other API-specific string classes) is always thorny. I'd much rather see an iterators- and algorithms-based approach, along the lines of your ct_string::iterator.
It might get equally thorny just trying to get the algorithms to recognize all the strange varieties of strings out there without writing iterator facades for the lot of them! It's probably possible, but I'm not I'd want it to be the primary interface for encoding. Most custom string types (both QString and CString, for instance) are designed to work with only one encoding (UTF-16 seems popular), so if you had some reason that you needed to store your strings in UTF-8, or - god forbid - Shift-JIS, you'd be out of luck. This is especially important when you're reading in arbitrary data whose encoding you don't know at compile-time. If someone sends me a message encoded in Shift-JIS and I want to forward it on, I don't want to have to decode it into UTF-8 and then re-encode it into Shift-JIS before I send it; I just want to store it in Shift-JIS.
Instead of doing this:
baz.encode(bar,rt::utf8);
I'd rather be able to do something like this:
typedef std::basic_string<some_32bit_char_type> unicode_string;
unicode_string u_string = /*...*/; std::string std_string = /*...*/;
typedef boost::recoding_iterator<boost::ucs4, boost::utf8> ucs4_to_utf8_iter; std::copy(ucs4_to_utf8_iter(u_string.begin()), ucs4_to_utf8_iter(u_string.end()), std::back_inserter(std_string));
std::strings aren't really appropriate for this purpose, at least not without a lot of changes to their interface, since they're designed for compile-time-tagged, fixed-width-encoding strings. In your examples, you have to remember what the source encoding is. This is easy enough if you know that "all my strings are in UTF-8", but if you start working with runtime-tagged strings (see my Shift-JIS example above), you'd need to keep track of every encoding in use. - Jim