
It's nice to see this thread from September picked up again, as I was a bit disappointed by the volume of response at the time to my proposal. I may be plugging this code in to something real quite soon, and will try to drum up some interest again here if I do. With XML being mentioned again, I think that character sets are something that need attention. [Be warned that some readers will not see new messages on this old thread.] Sebastian Redl wrote:
David Rodr?guez Ibeas wrote:
On Sep 27, 2007 5:31 PM, Joseph Gauterin <joseph.gauterin@googlemail.com> wrote:
[putting back the context]
If we had mutable strings consider how badly the following would perform: std::replace(utfString.begin(),utfString.end(),SingleByteChar,MultiByteChar); Although this looks O(n) at first glance, it's actually O(n^2), as the container has to expand itself for every replacement. I don't think a library should make writing worst case scenario type code that easy.
While this is a problem that I don't know if has a solution, an alternative replace can be implemented in the library that performs in linear time by constructing a new string copying values an replacing on the same iteration. Could std::replace() be disabled somehow?? (SFINAE??)
It ought to be possible to overload it and, if the string is not part of std, have the overloaded version be picked up with ADL. Only if replace() isn't explicitly qualified, of course, which is a problem. But I think immutable strings are the way forward anyway.
For a UTF-8 string, my proposal offered a mutable random-access byte iterator a const bidirectional character iterator a mutable output character iterator std::replace needs a mutable forward iterator, so you wouldn't be able to apply it to the character iterator. The library wouldn't "let you write worst case code". There is, however, the replace_copy algorithm, which I think does exactly what you need; it takes a pair of input iterators and an output iterator, i.e. something like utf8_string s1 = "......"; utf8_string s2; std::replace_copy(s1.begin(),s1.end(), utf8_string::character_output_iterator(s2), L'x',L'y'); Concerning mutable vs. immutable strings: which is best in any particular case clearly depends on the size of the string, the operation being performed, and whether it has a variable-length encoding. The programmer should be allowed to choose which to use. (An interesting case is where the size or character set changes at run-time, and a run-time choice of algorithm is appropriate.) Regards, Phil.