[boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004


      Robert Ramey wrote:
...
"Vladimir Prus" <ghost@cs.msu.su> wrote in message
news:cl2d2p$7a3$1@sea.gmane.org...
...
This was discussed extensively before. For example, Miro has pointed
out that even plain "find" is not suitable for unicode strings
because some characters can be represeted with several wchar_t
values.
Then, there's an issue of proper collation. Given that Unicode can
contain accents and various other "marks", it is not obvious that
string::operator<
will always to the right thing.
My reference (Stroustrup, The C++ Programming language) shows the
locale class containing a function
template<class Ch, class Tr, class A> // compare strings using this
locale bool operator()(const basic_string<Ch, Tr, A> & const
basic_string<Ch, Tr,
...
& ) const;
So I always presumed that there was a "unicode" locale that
implemented this as well all other required information.  Now that I
think about it I realize that it was only a presumption that I never
really checked.  Now I wonder what facitlities do most libraries do
provide for unicode facets.  I know there are ansi functions for
translating between multi-byte and wide character strings.  I've used
these functions and they did what I expected them to do.  I presumed
they worked in accordance with the currently
selected locale and its related facets.  If the
basic_string<wchar_t>::operator<(...) isn't doing "the right thing"
wouldn't it be just a bug in the implementation of the standard
library rather than a candidate for a boost library?
The use of 'wchar_t' is purely implementation defined as what it means,
other than the very little said about it in the C++ standard in relation to
'char'. It need have nothing to do with any of the Unicode encodings, or it
may represent a particular Unicode encoding. This is purely up to the
implementation. So doing the "right thing" is purely up to the implementer
although, of course, the implementer will tell you what the wchar_t
represents for that implementation.

[boost] Re: Any interest in adding unicode support to boost?

Edward Diener