
Eric Niebler wrote:
Mathias Gaunard wrote:
Also needed are tables that store the various character properties, and (hopefully) some parsers that build the tables directly from the Unicode character database so we can easily rev it whenever the database changes.
For the record, I have scripts that can generate ISO-8859-* to/from unicode tables from the downloaded data; I'll happily contribute this if it is useful to anyone.
The library provides the following core types in the boost namespace:
uchar8_t uchar16_t uchar32_t
In C++0x, these are called char, char16_t and char32_t.
I liked that idea of making them obviously-unsigned; I had some nasty bugs with my UTF-8 code where I made invalid assumptions about signs. But of course being consistent with C++0x is more important.
I strongly disagree with requiring normalization form C for the concept UnicodeRange. There are many more valid Unicode sequences.
Agreed.
the concrete algorithms must come first.
Agreed. Mathias, I would love to see a sort of "end user perspective" view of how this library will be used, i.e. its scope and basic usage pattern. Phil.