
Hi everyone. I'm in charge of the Unicode Google Summer of Code project. I have been working on range adaptors to iterate over code points in an UTF-x string as well as converting back those code points to UTF-y for the past week and I stopped working on these for a bit to put together some short documentation (which is my first quickbook document, so it may not be very pretty). This is not a documentation of the final work, but rather that of what I'm working on at the moment. I would like to know everyone's opinion of the concepts I am defining, which assume the range that is being worked on is indeed a valid unicode range in a particular encoding, as well as the system used to enforce those concepts. Also, I put the normalization form C as part of the invariant, but maybe that should be something orthogonal. I personally don't think it's really useful for general-purpose text though. While the system doesn't provide conversion from other character sets, this can easily be added by using assume_utf32. For example, using an ISO-8859-1 string as input to assume_utf32 just works, since ISO-8859-1 is included verbatim into Unicode. The documentation contains as well some introductory Unicode material. You can find the documentation online here: http://mathias.gaunard.emi.u-bordeaux1.fr/unicode/doc/html/