[boost] Re: [Unicode strings] We're off

21 Mar 2005

      Erik Wien wrote:
...
Daniel James wrote:
...
Why should such a string class stop at unicode? Wouldn't it be a good
idea to support other encodings? It might be better to have such a class
as part of a separate library, probably with 'pluggable' encodings,
which would include unicode.
That was the idea behind the "character_set_traits" class in the current 
prototype. You could just implement the tratis for some other encoding, 
and you'd be set. The problem though (and in my opinion it's a big one), 
is that for the encoded_string class (and any iostream implementation 
based on the same concepts) to be useable at all as a Unicode string 
class, we would have to include a lot of functionality that is Unicode 
specific. (Normalization is one example) What would we do with this 
functionality for Shift-JIS?
I have no idea ;) I know this is a complicated subject, and I'm far from
an expert.

I was writing about the suggested dyanmic string, 'utf_string', possibly
better called 'any_string', or 'encoded_string'. IMO your library should
concentrate on unicode (and perhaps encodings that are close enough to
unicode), and leave other encodings to other libraries. A dynamicly
encoded string class would probably require a different interface,
partly for efficiency's sake and partly because of the differences
between encodings. Also, it will be more important that it interacts
well with other string implementations.

Daniel