[boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004

      In article <cl3hl9$g4e$1@sea.gmane.org>, "Erik Wien" <wien@start.no> wrote:
...
The basic idea I have been working around, is to make a nencoded_string 
class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made 
possible through a encoding_traits class which contains all nececcary 
implementation details for working on strings of code units.
I generally agree with this design approach, but I don't think that code point 
iterators alone are sufficient. Iteration over encoded characters and abstract 
characters would be needed for some algorithms to function sensibly. For 
example, the simple task of:

find(begin, end, "ü")

needs to use abstract characters in order to be able to find precomposed and 
decomposed versions of ü.
...
You could use the encoded_string class like this:
// Constructor converts the ASCII string to UTF-16.
encoded_string<utf16> some_string("Hello World");
// Run some standard algorithm on the string:
std::for_each(some_string.begin(), some_string.end(), do_some_operation);
Again, taking this example, you let's say that do_some_operation performs 
canonicalization to some Unicode canonical form; you can't do this by iterating 
over code points.
...
I am aware that this implementation will be less that ideal for integration 
with the current c++ standard, but it's issues like that I would like to get 
deeper into during the develpoment.
You should explain what problems with integration you foresee.

meeroh

[boost] Re: Any interest in adding unicode support to boost?

Miro Jurisic