[boost] Re: Unicode string

7 Apr 2004


      Miro Jurisic wrote:
...
...
so the point is that when using string-as-code-point-container, even
searching and removing a character/substring might get invalid string?
E.g. even looking for string 'foo' you theoretically can find string
'foo' followed by composing character, and removing just 'foo' will be
invalid?
Yes, and this is true of all Unicode encodings. Essentially,
transformations that select or remove portions of a string require you to
be aware of character boundaries. Searching, substrings, and character
removal are such transformations, whereas concatenation isn't, so if you
have to strings in the same encoding, you can concatenate them without
dealing with character boundaries, and that's about it.
Okay.
...
...
...
basic_string is not the abstraction you are looking for, but it's also
the only one that is readily available in STL/boost today. It may serve
as a good starting point (questionable, IMNSHO), but it should most
definitely not be treated as the right thing to use for Unicode in the
long term.
I wonder what's the right abstraction then? Is it necessary to have a
class to represent abstract character, with all composing characters?
That's one way to go, yes; note that the moment you utter those words, you
put yourself into the position of designing a Unicode API :-) which you
said you don't want to do at this time.
You almost caugth me ;-) I've changed the message subject on purpose -- to
indicate that I'm not longer talking about program_options. 
I'm interested how 'right' unicode string can be implemented, but I don't
think sure it's possible to design such a string now, so program_options
will still have to use much simpler approach. 

- Volodya

[boost] Re: Unicode string

Vladimir Prus