
In article <001301c4b635$bda16750$6501a8c0@pdimov2>, "Peter Dimov" <pdimov@mmltd.net> wrote:
My opinion is that the std::char_traits<> experiment failed and conclusively demonstrated that the "string as a value" approach is a dead end, and that practical string libraries must treat a string as a sequential container, vector<char>, vector<char16_t> and vector<char32_t> in our case.
The interpretation of that sequence of integers as a concrete string value representation needs to be done by algorithms.
There is no dispute that the rep of the string needs to be a container. (Though I do not agree that it's obvious that it should be a vector.) However, the basic_string interface grafted on top of a container of Unicode code units will produce bogus Unicode strings. This is why I strongly believe that basic_string is not a suitable container for Unicode strings. A separate container which does not provide convenient and completely incorrect member functions (such as find and assign) should be used. Consider this; pretend that - c and d are characters - C and D are the same character with an umlaut - C and D do not have precomposed code units in Unicode basic_string<char16_t> s("Cc"); // pretend assign and find use iterator ranges, for simplicity s.assign(s.find("c"), "d"); This will result in "Dc", which is completely wrong IMNSHO, and there should not be a simple interface that allows you to shoot yourself in the foot so thoroughly. It is not strings-as-containers that I am opposed to, but the deceptive simplicity of basic_string member functions. meeroh