Re: [boost] RFC: interest in Unicode codecs?

18 Jul 2009

      Rogier van Dalen wrote:
...
Freestanding transcoding functions and codecvt facets are not the only
thing I believe a UTF library would need, though.
I've personally purposely chose not to use codecvt facets in my unicode 
library at all, but maybe I should provide them anyway for compatibility 
with the iostreams subsystem.
I don't really find those practical to use.

  I'd add to the list:
...
- compile-time encoding (meta-programming);
Didn't think of that.
...
Iterator
adaptors, I found, are a pain to attach error policies to and write
them correctly. For example, with a policy equivalent to your
"ReplaceCheckFailures", you need to produce the same code point
sequence whether you traverse an invalid encoded string forward or
backward. I've got code for UTF-8 that passes my unit tests, but the
error checking and the one-by-one decoding makes it much harder to
optimise.
For now my iterator adaptors (and the codecs they're based on for that 
matter) perform full checks, including checking that we don't go past 
the end of the input range (one way or the other).
While I wanted both versions with checks and without initially, only 
having one does make it easier to use.

An error policy isn't really enough though, because to do full checks 
you need each iterator to know about the begin and the end of the range 
it's working on which could be avoided altogether when trusting the input.

They're fairly simple implementations and were never benchmarked 
(benchmarking my library isn't even scheduled at the moment), but 
they're quite correct (proper unit tests are in the works).
...
I believe that Mathias Gaunard is working on a library at
<http://blogloufoque.free.fr/unicode/doc/html/>. I don't know how
complete it is, but from the documentation it looks well thought-out
so far. I'm looking forward to seeing where that's going!
Thanks!
I'm in the writing of several tutorials to make it easier to understand 
how it's designed. (plus I still need to actually implement some stuff 
that is in that version of the docs)