[boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004

      As I have said in a couple of other posts here, I have already started 
testing different approaces to this library and I might as well post some 
examples of what I have so far and how it would be used. I have only been 
looking closely at the string representation part so far, so don't expect 
too much. ;)

The basic idea I have been working around, is to make a nencoded_string 
class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made 
possible through a encoding_traits class which contains all nececcary 
implementation details for working on strings of code units.

The outline of the encoding traits class looks something like this:

template<typename encoding>
struct encoding_traits
    {
    // Type definitions for code_units etc.
    // Is the encoding fixed width? (allows a good deal of iterator 
optimizations)
    // Algoritms for iterating forwards and backwards over code units.
    // Function for converting a series of code units to a unicode code 
point.
    // Any other operations that are encoding specific.
    }

This traits class is used by the encoded_string class to provide support for 
strings using any unicode representation internally. This allows the 
programmer to choose what encoding should be used from string to string, 
depending on what would be best suited. The external interface of this class 
would mainly be code point iterators. These iterators can iterate over any 
encoded_string and the underlying encoding should be invisible. (This is 
something that requires a non standard iterator implementation according to 
the c++ spec, but would work nicely with the boost iterator library.)

You could use the encoded_string class like this:

 // Constructor converts the ASCII string to UTF-16.
encoded_string<utf16> some_string("Hello World");
// Run some standard algorithm on the string:
std::for_each(some_string.begin(), some_string.end(), do_some_operation);

I do currently have a really rough implementation that works like described 
above, and I would probably base parts of a potential library on that.

I am aware that this implementation will be less that ideal for integration 
with the current c++ standard, but it's issues like that I would like to get 
deeper into during the develpoment.

Any comments you might have on this approach are most welcome.

Regards
Erik

[boost] Re: Any interest in adding unicode support to boost?

Erik Wien