Re: [boost] [rfc] Unicode GSoC project

14 May 2009

      Eric Niebler wrote:
...
Mathias Gaunard wrote:
...
Also needed are tables that store the 
various character properties, and (hopefully) some parsers that build 
the tables directly from the Unicode character database so we can easily 
rev it whenever the database changes.
For the record, I have scripts that can generate ISO-8859-* to/from 
unicode tables from the downloaded data; I'll happily contribute this 
if it is useful to anyone.
...
The library provides the following core types in the boost namespace:
uchar8_t
uchar16_t
uchar32_t
In C++0x, these are called char, char16_t and char32_t.
I liked that idea of making them obviously-unsigned; I had some nasty 
bugs with my UTF-8 code where I made invalid assumptions about signs.  
But of course being consistent with C++0x is more important.
...
I strongly disagree with requiring normalization form C for the concept 
UnicodeRange. There are many more valid Unicode sequences.
Agreed.
...
the concrete algorithms must come first.
Agreed.  Mathias, I would love to see a sort of "end user perspective" 
view of how this library will be used, i.e. its scope and basic usage pattern.

Phil.

Re: [boost] [rfc] Unicode GSoC project

Phil Endecott