New subject: GSOC unicode and py3k

20 Jun 2009

      Here is the documentation of the current state of the Unicode library 
that I am doing as a google summer of code project:
http://blogloufoque.free.fr/unicode/doc/html/

This preview comes a bit later than I planned, but after struggling to 
keep the documentation and the code in sync I decided to move to 
automatic documentation generation with doxygen, which I had trouble 
setting up due to my experience.
The reference still lacks a lot of information, however.

The library only features UTF support and the Unicode Character Database 
(not fully updated to latest Unicode version) at the moment, but 
grapheme clusters and normalization support will come very soon.

I would like to get feedback on the UTF codecs, the various concepts 
(Pipe, Consumer, BoundaryChecker) and the whole approach of lazy ranges.
Grapheme clusters (and other text boundaries facilities) support will 
also be provided in terms of the Consumer and BoundaryChecker concepts.

The library features lazy ranges similar to that of Boost.RangeEx, and I 
used one of the naming conventions that was proposed during review: 
u8_encode is the eager algorithm, u8_encoded is the lazy one.
Since it wasn't really agreed which naming to use for RangeEx, I would 
like this to be discussed as well.

GSoC Unicode library: second preview

Mathias Gaunard

Mathias Gaunard

troy d. straszheim

Darren Garvey

Mathias Gaunard

Phil Endecott

Mathias Gaunard

Phil Endecott

tags

participants (4)