[boost] Re: Re: Any interest in adding unicode support to boost?

20 Oct 2004

      "John Maddock" <john@johnmaddock.co.uk> wrote in message
...
Interesting: funnily enough I've just started experimenting with Unicode 
support for Boost.Regex (based initially on top of ICU, but it could 
equally sit on top of Boost.Unicode or whatever).  The first thing I had 
to do was write a bunch of iterators for interconverting between encoding 
forms (I needed Bidirectional Iterators which code conversion facets 
don't/can't provide).  So I guess we're all on a similar page here, can 
your encoding converters proved efficient iterator-based interconversion?
Well.. "efficient" is probably not the word I would use ;), yet that is. The 
way it is implemented right now, the value_type of a encoded_string iterator 
of any encoding is 32bit. (A unicode code-point.) So when iterating over any 
encoding, the external interface always looks as a vector of code points. 
Consequently you can use iterators from one string (UTF-8) to initialize 
another string (UTF-16) and the conversion between the two encodings would 
happen automatically. I'm guessing this is something similar to what you 
have.

I also have a rather hackish implementation that can provide non-const 
(assignable) code point iterators on any encoding. This involves a lot of 
trickery with iterators changing the size the container they are iterating 
over, and proxy classes as a reference_type in the iterator. (something that 
is not allowed (yet) in standard C++, but is in boost) As you can imagine, 
this implementation is everything but efficient. Kinda neat though! ;)

[boost] Re: Re: Any interest in adding unicode support to boost?

Erik Wien