Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?

19 Jul 2011

      ...
...
...
Boost.Filesystem needs the UTF-32 to UTF-16 and UTF-16 to UTF-32
adapters to implement char16_t and char32_t support. Do they have any
known bugs or other outstanding problems?
Yes, they can read past the end of your input range if it contains 
invalid
data at the end.
Interesting. Would a fix be difficult?
I was about to say there aren't any known issues, but yes that is a 
problem - and a fix would mean changing the interface - the problem comes 
because the iterators only store the current position in the underlying 
sequence and assumes that they can increment or decrement over a complete 
multi-byte sequence.  So if your underlying sequence contains a *truncated* 
multibye sequence at the start or end of the string then they can read 
past-the-end or even past-the-start :-(

The only real fix is to redesign them to be range-based, so we can add the 
additional checks necessary, but of course this also makes them more 
heavyweight than they are at present.  I guess I was hoping we would have 
had a proper Unicode library for this by now (in Boost that is, not the 
sandbox ;)

Oh well, maybe I should just bite the bullet and change/fix this hole.

John.

Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?

John Maddock