[boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004

      Hi. Thanks for the feedback!

"Miro Jurisic" <macdev@meeroh.org> wrote in message 
news:macdev-BACD3C.13585519102004@sea.gmane.org...
...
I generally agree with this design approach, but I don't think that code 
point
iterators alone are sufficient.
Neither do I as the matter a fact, but this is as far as I have come right 
now. :) There would probably be different types of iterators (or iterator 
wrappers) made available to enable iterations over everything from code 
units to code points/abstract characters.
...
Iteration over encoded characters and abstract
characters would be needed for some algorithms to function sensibly. For
example, the simple task of:
find(begin, end, "ü")
needs to use abstract characters in order to be able to find precomposed 
and
decomposed versions of ü.
True... And this is a point where implemtation would be less than trivial. 
Comparing strings in unicode is anything BUT trivial, and it's imperative to 
find a good way to implement this functionallity through the standard 
algorithms.
...
Again, taking this example, you let's say that do_some_operation performs
canonicalization to some Unicode canonical form; you can't do this by 
iterating
over code points.
Nope. A code unit iterator would be needed for things like that.
...
...
I am aware that this implementation will be less that ideal for 
integration
with the current c++ standard, but it's issues like that I would like to 
get
deeper into during the develpoment.
You should explain what problems with integration you foresee.
I think I was thinking a little ahead of myself when I wrote that. :) The 
implementation described here would not pose too much of a problem, I was 
thinking more of the problems that arise when you take things like collation 
and locales into consideration. From what i understand there is a real issue 
in enabling proper unicode support in the standard classes like locale, 
ctype and collate, as they assume things that do not neccesarily apply to a 
unicode representation of text. A failiure to enable good support in those 
classes (at least locale and ctype), would also make the iostream support 
break, and things start to snowball. I could very well be wrong on this 
(Actually, I hope I am! :) ), as I haven't had the time to read up on all 
issues concerning this. But again, this is one of many problems I hope 
running this project will help reveal.

[boost] Re: Any interest in adding unicode support to boost?

Erik Wien