
Hello Graham, There was a student project aiming to produce a Unicode library, but I didn't hear anything of it after the thread in http://lists.boost.org/boost/2005/03/22580.php There are loads of comments and ideas in that thread. Everyone wants a Unicode library, but no-one seems to have enough time to write it well. I again have been playing with the idea of trying to write a library over the past few weeks. You seem to be quite well versed in Unicode. My (hopefully constructive) comments on your post: First, are WORD and DWORD the Windows equivalents of uint16_t and uint32_t, respectively? I think the C++ way would be to ultimately leave the choice of encoding to the user through a template parameter. This would, I guess, do away with the assign* and insert* methods for various encodings. I think the normalisation form should be an invariant of the string as well (and a template parameter). This makes it possible to implement operator== and operator< as binary comparisons of codepoints, so that they will be relatively fast (more so for UTF-8 and UTF-32 than for UTF-16). People will surely want to use the string as a key for std::map's, for example. Other more expensive collation methods (including localised ones) could be implemented by different classes. As far as the iterators are concerned, I believe the standard Unicode string should contain grapheme clusters, and thus its iterator should have this beast as its value_type (I would call it "character" because as far as the Unicode standard and combining characters are concerned, C++ programmers in general are "users", and grapheme clusters is what they think of as characters). Hope this helps. Rogier