[boost] Re: Unicode string

7 Apr 2004


      Hi Miro,
...
...
You almost caugth me ;-) I've changed the message subject on purpose --
to indicate that I'm not longer talking about program_options.
I'm interested how 'right' unicode string can be implemented, but I don't
think sure it's possible to design such a string now, so program_options
will still have to use much simpler approach.
I am somewhat reluctant to discuss this in detail at this time, not
because I have something to hide, but because I have something to learn: I
need to investigate some aspect of Unicode, the ICU library, and locales
and facets in the C++ standard before I can form a more complete picture
of the design of a Unicode string. However, I don't have the time to do
all the research right now, because there are other things I need to do
that I am getting paid to do, and full Unicode support is not on my work
too list.
That's what I think too. There's too many unicode issue and too little time.
...
Basically, I know enough to know how _not_ to do it, but I am
not sure that I know enough to know how to do it right :-)
However, I currently think that there are legitimate reasons why one would
want to view a Unicode string as (in increasing order of complexity):
- a sequence of code points (this is useful for serialization)
 - a sequence of encoded characters (this is useful for transcoding)
 - a sequence of abstract characters (this is useful for most high-level
 string
transformations, such as substrings, find, etc.)
Therefore I think that a Unicode string should probably not be represented
as a container of any one of those three, but instead should have an
interface that lets you treat it in different ways depending on your
needs. (One way to do this is to have three kinds of iterators for Unicode
strings).
This seems reasonable. Hopefully one day someone will take the time to
really think though all the issues and implement something.

- Volodya

[boost] Re: Unicode string

Vladimir Prus