
Hello Eric,
That's a very interesting wrapper of ICU. It might even kill off my Unicode library.
I think Boost probably needs a Unicode library that is *not* a wrapper of ICU.
One of the points I considered when I was developing Boost.Locale was actually independence of ICU. You would not find any interface that exposes ICU, (even thou it is quite connected to it). In fact, for CppCMS I have its small version for embedded systems (that lacks 80% of its original abilities that work over std::locale only and it is quite useless for Boost). So it is possible to replace someday in future ICU or parts of it with other (maybe better) Unicode engine. For example, I do not use ICU Resource Bundles for message translation, I actually use GNU Gettext MO Catalogs (not runtime library). **However** developing and debugging new library that nearly close to abilities of ICU (v4.2 has about 680,000 lines of source code) requires many hundred man years and is absolutely unfeasible, unless some big company would donate their time build it (as IBM does with ICU). Please read this: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#design-rationa... So the question isn't whether Boost needs its own Unicode library, or it needs good ICU wrapper the question is whether Boost needs Unicode library at all. For the record: I remember somebody posted initial version of Unicode library that included **only** properties of characters and it didn't went much far then that. But you also need CLDR and its processing, you need dictionaries for proper break iteration and so on, implement numerous Unicode algorithms and so on. This is huge work to do. Once I did comparison of ICU with glib and Qt (big fat libraries that have lots of goodies for Unicode, neither one of them was as correct as ICU).
What is it's license?
You mean Boost.Locale? Boost license. If you ask about ICU license, it is permissive license very similar in terms to Boost, MIT and other such licenses: http://source.icu-project.org/repos/icu/icu/trunk/license.html
How generic is it? (E.g. can I run Unicode algorithms over non-contiguous data?)
Only boundary iterator provides API that allows you to work over non-continuous data but internally it is still converted to continuous. But neither ICU. It has very few classes (BreakIterator and few others) that allow you to work over non-continuous data (UText) and this is very painful in any case (UText is not very nice API). So, during design I aimed to rather correct and useful approach then totally generic and extremely efficient one. Best, Artyom