
"John Maddock" <john@johnmaddock.co.uk> wrote in message
Interesting: funnily enough I've just started experimenting with Unicode support for Boost.Regex (based initially on top of ICU, but it could equally sit on top of Boost.Unicode or whatever). The first thing I had to do was write a bunch of iterators for interconverting between encoding forms (I needed Bidirectional Iterators which code conversion facets don't/can't provide). So I guess we're all on a similar page here, can your encoding converters proved efficient iterator-based interconversion?
Well.. "efficient" is probably not the word I would use ;), yet that is. The way it is implemented right now, the value_type of a encoded_string iterator of any encoding is 32bit. (A unicode code-point.) So when iterating over any encoding, the external interface always looks as a vector of code points. Consequently you can use iterators from one string (UTF-8) to initialize another string (UTF-16) and the conversion between the two encodings would happen automatically. I'm guessing this is something similar to what you have. I also have a rather hackish implementation that can provide non-const (assignable) code point iterators on any encoding. This involves a lot of trickery with iterators changing the size the container they are iterating over, and proxy classes as a reference_type in the iterator. (something that is not allowed (yet) in standard C++, but is in boost) As you can imagine, this implementation is everything but efficient. Kinda neat though! ;)