[boost] Re: Any interest in adding unicode support to boost?

20 Oct 2004


      "Eric Niebler" <eric@boost-consulting.com> wrote in message
...
Such a one-size-fits-all unicode_string is guaranteed to be inefficient 
for some applications.
Yes... That's why I would like the encoding to be templated. Allowing the 
programmer to choose the encoding best suited for his/her needs.
...
If it is always stored in a decomposed form, an XML library probably 
wouldn't want to use it, because it requires a composed form. And making 
the encoding an implementation detail makes it inefficient to use in 
situations where binary compatibility matters (serialization, for 
example).
I think the best solution is to store the string in the form it was 
originally recieved (decomposed or not), and instead provide composition 
functions or even iterator wrappers that compose on the fly. That would 
allow for composed strings to be used if needed (like in a XML library, but 
not imposing that requirement on all other users.
...
Also, it is impossible to store an abstract unicode character in char32_t 
because there may be N zero-width combining characters associated with it.
Quite true.. Storing abstract characters would require some variable width 
storage facility.
...
Perhaps having a one-size-fits-all unicode_string might be a nice default, 
as long as users who care about encoding and canonical form have other 
types (template + policies?) with knobs they can twiddle.
I would really like to provide enough knobs to keep everyone happy! ;)

[boost] Re: Any interest in adding unicode support to boost?

Erik Wien