[boost] Re: Any interest in adding unicode support to boost?

21 Oct 2004


      Rogier van Dalen wrote:
...
On Wed, 20 Oct 2004 12:48:31 -0700, Eric Niebler
<eric@boost-consulting.com> wrote:
...
I think the default should be UTF-16 encoding, and that the iterator
should use a scheme like this to be random access. Rationale: there are
string algorithms that benefit from random access (Boyer-Moore comes to
mind).
Correct me if I'm wrong. From what I gather from a Google search,
Boyer-Moore is a fast string search algorithm. Why not use the
algorithm on the code units rather than codepoints? UTF-8 and UTF-16
are both not stateful, specifically to allow optimisations such as
this (as well as error recovery).
Searching a Unicode string for a particular bit pattern is not 
particularly meaningful because the same string can be represented with 
different bit patterns. Have I misinterpreted what you are suggesting?


-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com