
In article <cl3hl9$g4e$1@sea.gmane.org>, "Erik Wien" <wien@start.no> wrote:
The basic idea I have been working around, is to make a nencoded_string class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made possible through a encoding_traits class which contains all nececcary implementation details for working on strings of code units.
I generally agree with this design approach, but I don't think that code point iterators alone are sufficient. Iteration over encoded characters and abstract characters would be needed for some algorithms to function sensibly. For example, the simple task of: find(begin, end, "ü") needs to use abstract characters in order to be able to find precomposed and decomposed versions of ü.
You could use the encoded_string class like this:
// Constructor converts the ASCII string to UTF-16. encoded_string<utf16> some_string("Hello World"); // Run some standard algorithm on the string: std::for_each(some_string.begin(), some_string.end(), do_some_operation);
Again, taking this example, you let's say that do_some_operation performs canonicalization to some Unicode canonical form; you can't do this by iterating over code points.
I am aware that this implementation will be less that ideal for integration with the current c++ standard, but it's issues like that I would like to get deeper into during the develpoment.
You should explain what problems with integration you foresee. meeroh