
Sean Parent wrote:
I don't have enough time to delve deeply into this thread but I thought I'd make a few passing comments.
Adobe has a fairly major string class problem (we joke that every project must have it's own string class - which is nearly true). There isn't such thing as a single type of string - there are _many_ purposes and you need to be able to handle things like language and style runs and large, large blocks of text with efficient edits, UI substations (which are aware of things like split negation and masculine/feminine forms), language based ordering, different encodings...
We need another string class like a hole in the head.
What we do need - are good standard algorithms which can be applied to any string class.
I believe this is doable with the current iterator interface.
I believe it's possible (meaning I've done some quick experiments) to define an input iterator (actually as strong as a non-mutating forward iterator) and output iterator, which do conversions. This means that you can define operations in terms of unicode encoding (though some operations such as ordering may still require a locale).
Consider -
to_lower(first, last, output) to_upper(first, last, output)
such transformations can work with any encoding (you can uppercase UTF-8 into UTF-32). They can't work in-situ (but I don't think to_upper or to_lower really can work in-situ - certainly not in UTF-8 and probably not in UTF-16, and I believe there are some multi- character forms that even break in UTF-32...). It is possible though to wrap them with a replace function for in-place operations.
The current std::find() will work with such iterator adapters to find single UTF-32 character (in any encoded sequence).
Currently with ASL we're taking such an approach for localization strings (replacing an existing string class for localized strings at Adobe with a small set of functions and _any_ string class (any sequence of code units), including std::string, std::vector (or deque or list).
You might take a look here for some ideas: <http:// opensource.adobe.com/group__asl__xstring.html>.
This is very close to what I have in mind. The main difference is that the functions/algorithms in my mind take ranges instead of iterators. Thus: to_lower(src, dest) to_upper(src, dest) With these, I could make Fusion like wrappers that transform them into something like: some_string s1 = to_lower(src); some_string s2 = to_upper(src); where to_lower and to_upper return cheap views that are in and by themselves valid strings/ranges. They are cheap because the actual conversions/transformations are done on demand-- think lazy evaluation. So, like those done by expression template techniques, there are no expensive temporaries when you perform seemingly expensive tasks like: some_string s = f1(f2(f3(f4(src)))); And yes, because they are generic, those string algorithms can work on any string type that satisfy some basic requirements. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net