Re: [boost] [unicode] Interest Check / Proof of Concept

19 Nov 2008

      ...
Over the past few months, I've been tinkering with a Unicode string library.
It's still *far* from finished, but it's far enough along that the overall
structure is visible. I've seen a bunch of Unicode proposals for Boost come
and go, so hopefully this one will address the most common needs people
have.
I would love to see a Unicode support library added to Boost.
However, I question the usefulness of another string class, or in this
case another hierarchy of string classes.  Interoperability with
std::string (and QString, and CString, and a thousand other
API-specific string classes) is always thorny.  I'd much rather see an
iterators- and algorithms-based approach, along the lines of your
ct_string::iterator.  Instead of doing this:
...
baz.encode(bar,rt::utf8);
I'd rather be able to do something like this:

typedef std::basic_string<some_32bit_char_type> unicode_string;

unicode_string u_string = /*...*/;
std::string std_string = /*...*/;

typedef boost::recoding_iterator<boost::ucs4, boost::utf8> ucs4_to_utf8_iter;
std::copy(ucs4_to_utf8_iter(u_string.begin()),
ucs4_to_utf8_iter(u_string.end()), std::back_inserter(std_string));

// or

typedef boost::recoding_iterator<boost::utf8, boost::ucs4> utf8_to_ucs4_iter;
std::copy(utf8_to_ucs4_iter(std_string.begin()),
utf8_to_ucs4_iter(std_string.end()), std::back_inserter(u_string));

Having iterators that do the right thing, in terms of stepping over
code points or (possibly synthesized) characters as appropriate, in an
efficient manner, would provide a toolkit with which anyone could
write whatever custom Unicode-aware code they need.

Zach

Re: [boost] [unicode] Interest Check / Proof of Concept

Zach Laine