[boost] Re: Any interest in adding unicode support to boost?

20 Oct 2004

      In article <001301c4b635$bda16750$6501a8c0@pdimov2>,
 "Peter Dimov" <pdimov@mmltd.net> wrote:
...
My opinion is that the std::char_traits<> experiment failed and conclusively 
demonstrated that the "string as a value" approach is a dead end, and that 
practical string libraries must treat a string as a sequential container, 
vector<char>, vector<char16_t> and vector<char32_t> in our case.
The interpretation of that sequence of integers as a concrete string value 
representation needs to be done by algorithms.
There is no dispute that the rep of the string needs to be a container. (Though 
I do not agree that it's obvious that it should be a vector.) However, the 
basic_string interface grafted on top of a container of Unicode code units will 
produce bogus Unicode strings. This is why I strongly believe that basic_string 
is not a suitable container for Unicode strings. A separate container which does 
not provide convenient and completely incorrect member functions (such as find 
and assign) should be used.

Consider this; pretend that

 - c and d are characters
 - C and D are the same character with an umlaut
 - C and D do not have precomposed code units in Unicode

basic_string<char16_t>  s("Cc");
// pretend assign and find use iterator ranges, for simplicity
s.assign(s.find("c"), "d");

This will result in "Dc", which is completely wrong IMNSHO, and there should not 
be a simple interface that allows you to shoot yourself in the foot so 
thoroughly.

It is not strings-as-containers that I am opposed to, but the deceptive 
simplicity of basic_string member functions.

meeroh

[boost] Re: Any interest in adding unicode support to boost?

Miro Jurisic