Re: [boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004

      Erik Wien wrote:
...
Ultimately I feel that the operation of normalization (which involves
canonical decomposition) of unicode strings should be hidden from the
user completely and be performed automatically by the library where
that is needed. (Like on a call to the == operator.)
It appears that there are two schools of thought when it comes to string 
design. One approach treats a string purely as a sequential container of 
values. The other tries to represent "string values" as a coherent whole. It 
doesn't help that in the simple case where the value_type is char the two 
approaches result in mostly identical semantics.

My opinion is that the std::char_traits<> experiment failed and conclusively 
demonstrated that the "string as a value" approach is a dead end, and that 
practical string libraries must treat a string as a sequential container, 
vector<char>, vector<char16_t> and vector<char32_t> in our case.

The interpretation of that sequence of integers as a concrete string value 
representation needs to be done by algorithms.

In other words, I believe that string::operator== should always perform the 
per-element comparison std::equal( lhs.begin(), lhs.end(), rhs.begin() ) 
that is specified in the Container requirements table.

If I want to test whether two sequences of char16_t's, interpreted as UTF16 
Unicode strings, would represent the same string in a printed form, I should 
be given a dedicated function that does just that - or an equivalent. 
Similarly, if I want to normalize a sequence of chars that are actually 
UTF8, I'd call the appropriate 'normalize' function/algorithm.

But I may be wrong. :-)

Re: [boost] Re: Any interest in adding unicode support to boost?

Peter Dimov