Re: [boost] Re: Any interest in adding unicode support to boost?

20 Oct 2004


      ...
...
If it is always stored in a decomposed form, an XML library probably
wouldn't want to use it, because it requires a composed form. And making
the encoding an implementation detail makes it inefficient to use in
situations where binary compatibility matters (serialization, for
example).
I think the best solution is to store the string in the form it was
originally recieved (decomposed or not), and instead provide composition
functions or even iterator wrappers that compose on the fly. That would
allow for composed strings to be used if needed (like in a XML library, but
not imposing that requirement on all other users.
I don't think I can agree on that. If you do a lot of input/output,
this might yield a better performance, but even in reading XML, you
probably need to compare strings a lot, and if they are not
normalised, this will really take a lot of processing.
Correct me if I'm wrong, but a simple comparison of two non-normalized
Unicode strings would take looking up the characters in the Unicode
Character Database, decomposing every single character, gathering base
characters and combining marks, and ordering the marks, then comparing
them. And this must be done for every character. I don't have any
numbers, of course, but I have this feeling it is going to be really
really slow.

Regards,
Rogier

Re: [boost] Re: Any interest in adding unicode support to boost?

Rogier van Dalen