
Over the past few months, I've been tinkering with a Unicode string library. It's still *far* from finished, but it's far enough along that the overall structure is visible. I've seen a bunch of Unicode proposals for Boost come and go, so hopefully this one will address the most common needs people have.
I would love to see a Unicode support library added to Boost. However, I question the usefulness of another string class, or in this case another hierarchy of string classes. Interoperability with std::string (and QString, and CString, and a thousand other API-specific string classes) is always thorny. I'd much rather see an iterators- and algorithms-based approach, along the lines of your ct_string::iterator. Instead of doing this:
baz.encode(bar,rt::utf8);
I'd rather be able to do something like this: typedef std::basic_string<some_32bit_char_type> unicode_string; unicode_string u_string = /*...*/; std::string std_string = /*...*/; typedef boost::recoding_iterator<boost::ucs4, boost::utf8> ucs4_to_utf8_iter; std::copy(ucs4_to_utf8_iter(u_string.begin()), ucs4_to_utf8_iter(u_string.end()), std::back_inserter(std_string)); // or typedef boost::recoding_iterator<boost::utf8, boost::ucs4> utf8_to_ucs4_iter; std::copy(utf8_to_ucs4_iter(std_string.begin()), utf8_to_ucs4_iter(std_string.end()), std::back_inserter(u_string)); Having iterators that do the right thing, in terms of stepping over code points or (possibly synthesized) characters as appropriate, in an efficient manner, would provide a toolkit with which anyone could write whatever custom Unicode-aware code they need. Zach