[boost] [rfc] Unicode GSoC project

12 May 2009

      Hi everyone. I'm in charge of the Unicode Google Summer of Code project.

I have been working on range adaptors to iterate over code points in an 
UTF-x string as well as converting back those code points to UTF-y for 
the past week and

I stopped working on these for a bit to put together some short 
documentation (which is my first quickbook document, so it may not be 
very pretty).
This is not a documentation of the final work, but rather that of what 
I'm working on at the moment.

I would like to know everyone's opinion of the concepts I am defining, 
which assume the range that is being worked on is indeed a valid unicode 
range in a particular encoding, as well as the system used to enforce 
those concepts.

Also, I put the normalization form C as part of the invariant, but maybe 
that should be something orthogonal. I personally don't think it's 
really useful for general-purpose text though.

While the system doesn't provide conversion from other character sets, 
this can easily be added by using assume_utf32. For example, using an 
ISO-8859-1 string as input to assume_utf32 just works, since ISO-8859-1 
is included verbatim into Unicode.

The documentation contains as well some introductory Unicode material.

You can find the documentation online here:
http://mathias.gaunard.emi.u-bordeaux1.fr/unicode/doc/html/

[boost] [rfc] Unicode GSoC project

Mathias Gaunard