
It's been a while (about a year actually) since I had some feedback about my Unicode library, so here I am requesting for comments. The Unicode library provides facilities to convert between UTF and locale encodings in a way as nice and generic to use as possible, as well as a few Unicode character properties that can be used for normalization or segmentation into graphemes. The largest part of the library is actually a fairly intricate generic Converter and Segmenter system, that, among others, allows to define, in an easy and stateless way, a variable-width N to M conversion step. The conversion can then be applied normally on the input, or step by step by an iterator or range adaptor, essentially performing a lazy conversion. Converters can be combined, and can be used to make codecvt facets, which allows them to be transparently applied by standard file streams. Converters can also be built from codecvt facets, which is how the Unicode library provides conversion between locale encodings. I think the whole system really deserves to be a library of itself and not just part of Unicode, but I'm unsure of how to deal with this in Boost. I think it's quite cool, but I haven't really seen much interest into it. I may write a short tutorial of how to write base64 codecs with it and how to use that with iostreams just to show it off a bit more outside of a Unicode context. Anyway, the docs are here: <http://mathias.gaunard.com/unicode/doc/html/> And the code is on the sandbox: <https://svn.boost.org/svn/boost/sandbox/SOC/2009/unicode/> As I have said before, I will be submitting the full thing for formal review *soon*, i.e mid-september. The changes that will go in are mostly performance-related: I'm experiencing with things right now and doing benchmarks, considering unsafe codecs and SIMD ones (SIMD is not just an implementation detail, due to the step-by-step evaluation; using SIMD means having a much larger step -- and of course, it cannot be safe). I also need to tackle the issue of compile-time, which is quite large: I need better header separation. I also need to find a better solution, from a binary point of view, to expose composition from the shared library, as the current one doesn't give much flexibility in implementation.