
Peter Dimov wrote:
Ultimately I feel that the operation of normalization (which involves canonical decomposition) of unicode strings should be hidden from the user completely and be performed automatically by the library where that is needed. (Like on a call to the == operator.)
It appears that there are two schools of thought when it comes to string design. One approach treats a string purely as a sequential container of values. The other tries to represent "string values" as a coherent whole. It doesn't help that in the simple case where the value_type is char the two approaches result in mostly identical semantics.
My opinion is that the std::char_traits<> experiment failed
I agree to that.
and conclusively demonstrated that the "string as a value" approach is a dead end,
How was it demonstrated? There are two separate questions. First, is how many operations are methods of 'string' and how many are external. Contrary to what Exception C++ says, I believe many methods in string is OK. As an example, QString presents huge but consistent interface, while in standard C++ we have string, boost::format, boost::tokenizer and boost::string_algo , and simply it's too many separate docs to look at. Second question is if operator==, operator< or 'find' should operate on vector<char_XX> or on abstract characters, using Unicode rules, or there should be two versions. I don't really understand why 'unicode-unaware' semantic is ever needed, so we should have only 'unicode-aware' one.
But I may be wrong. :-)
Me too. - Volodya