
It seems that Unicode support in Boost (which could lead to Unicode support in the C++ language and standard library) would be quite desirable. The IBM International Components for Unicode (ICU) library (http://oss.software.ibm.com/icu/) is an existing C++ library with what appears to be a Boost-compatible license, which provides all or most of the Unicode support that would be desired in Boost or the C++ standard library, in addition to Unicode-equivalents of libraries already either in the standard library or in Boost, including number/currency formatting, date formatting, message formatting, and a regular expression library. Unfortunately, it does not use C++ exceptions to signal exceptional conditions (but rather it uses an error code return mechanism), it does not follow Boost naming conventions, and although there are some C++-specific facilities, most of the C++ API is the same as the C API, thus resulting in a less-than-optimal C++ interface. Nonetheless, I think Boostifying the ICU library would be quite feasible, whereas attempting to reimplement all of the desired functionality that the ICU library provides would be extremely time consuming, since the collating and other services in the ICU library already support a large number of locales, and the character-set conversion facilities support a large number of character sets. The representation of locales does present an issue that needs to be considered. The existing C++ standard locale facets are not very suitable for a variety of reasons: - The standard facets (and the locale class itself, in that it is a functor for comparing basic_strings) are tied to facilities such as std::basic_string and std::ios_base which are not suitable for Unicode support. - The interface of std::collate<Ch> is not at all suitable for providing all of the functionality desired for Unicode string collation. A suitable Unicode collation facility should at least allow for user-selection of the strength level used (refer to http://www.unicode.org/unicode/reports/tr10/), and would ideally also support customizations as extensive as the ICU library does (refer to http://oss.software.ibm.com/icu/userguide/Collate_ServiceArchitecture.html and http://oss.software.ibm.com/icu/userguide/Collate_Customization.html). - Facilities such as Unicode string collation are heavily data-driven, and it would be inefficient to load the data for facilities that are not used. This could be avoided by using some sort of lazy loading mechanism. It would still be possible to use the standard locale object as a container of an entirely new set of facets, which could be loaded from the data sources based on the name of the locale, and ``injected'' into an existing locale object, by calling some function. It is not clear, however, what advantage this would serve over simply using a thin-wrapper over a locale name to represent a ``locale,'' as is done in the ICU library. -- Jeremy Maitin-Shepard