
"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:cl2d2p$7a3$1@sea.gmane.org...
This was discussed extensively before. For example, Miro has pointed out that even plain "find" is not suitable for unicode strings because some characters can be represeted with several wchar_t values.
Then, there's an issue of proper collation. Given that Unicode can contain accents and various other "marks", it is not obvious that string::operator< will always to the right thing.
My reference (Stroustrup, The C++ Programming language) shows the locale class containing a function template<class Ch, class Tr, class A> // compare strings using this locale bool operator()(const basic_string<Ch, Tr, A> & const basic_string<Ch, Tr, A> & ) const; So I always presumed that there was a "unicode" locale that implemented this as well all other required information. Now that I think about it I realize that it was only a presumption that I never really checked. Now I wonder what facitlities do most libraries do provide for unicode facets. I know there are ansi functions for translating between multi-byte and wide character strings. I've used these functions and they did what I expected them to do. I presumed they worked in accordance with the currently selected locale and its related facets. If the basic_string<wchar_t>::operator<(...) isn't doing "the right thing" wouldn't it be just a bug in the implementation of the standard library rather than a candidate for a boost library? Robert Ramey