
Hello, I just scanned about 300 boost-devel messages with the word "Unicode" and am very excited about the occasional mentions I see of a Boost Unicode library. Is that project still alive? Is there a prototype or beta of any sort, or even a simple statement of goals I can look at for the proposed boost project? I am about to embark on a large text processing (but _not_ display) project and could make use of such a library. (digression: part of it will even involve the processing of Thai text, which seems to be the #1 cited example of a weird language as far as i18n is concerned. Having myself typeset a 283-page bilingual Thai-English book, I have to agree :) The last mentions I found were from late 2005, where Graham Barnett mentioned a Unicode library was under development: http://thread.gmane.org/gmane.comp.lib.boost.devel/128403 http://thread.gmane.org/gmane.comp.lib.boost.devel/129807 I tried searching the vault for 'unicode' but no dice. I have examined (and would use by default) ICU from IBM: http://icu.sourceforge.net/userguide/intro.html I would use its C++ UnicodeString, CharacterIterator, Locale-based codepage converters, Normalization support, Collation support, and regex matching (in particular with regex's that match character classes like "nonspacing mark"). How do the proposed Boost library's capabilities differ from those offered by ICU? I've seen that there is ICU integration in Boost.Regex http://www.boost.org/libs/regex/doc/unicode.html And of course it is possible today to store UTF-16 data in a std::wstring and convert between UTF-8, UTF-16, and UTF-32 using various easily available routines. But as you can see above I need more capability than just that. ICU is probably sufficient, but I thought it might be nice to use something that fits in with the rest of boost and STL more nicely. Something that used/extended existing string mechanisms, iteration mechanisms, and conversion mechanisms (e.g. those "code conversion facets" which I do not yet understand :). Consistent naming, error reporting, and coding conventions would be a superficial but nice added bonus. I would hope that any such library would make some stabs at performance enhancements such as ICU's UnicodeString's ability to alias other strings to avoid copies, or store very small strings inline. Since ICU has since disabled some of those enhancements: http://icu.sourceforge.net/userguide/strings.html#unistr_performance perhaps that would provide the Boost library an opportunity to beat ICU's performance! Thanks for all updates, - Chris Pirazzi