
Rogier van Dalen wrote:
Freestanding transcoding functions and codecvt facets are not the only thing I believe a UTF library would need, though.
I've personally purposely chose not to use codecvt facets in my unicode library at all, but maybe I should provide them anyway for compatibility with the iostreams subsystem. I don't really find those practical to use. I'd add to the list:
- compile-time encoding (meta-programming);
Didn't think of that.
Iterator adaptors, I found, are a pain to attach error policies to and write them correctly. For example, with a policy equivalent to your "ReplaceCheckFailures", you need to produce the same code point sequence whether you traverse an invalid encoded string forward or backward. I've got code for UTF-8 that passes my unit tests, but the error checking and the one-by-one decoding makes it much harder to optimise.
For now my iterator adaptors (and the codecs they're based on for that matter) perform full checks, including checking that we don't go past the end of the input range (one way or the other). While I wanted both versions with checks and without initially, only having one does make it easier to use. An error policy isn't really enough though, because to do full checks you need each iterator to know about the begin and the end of the range it's working on which could be avoided altogether when trusting the input. They're fairly simple implementations and were never benchmarked (benchmarking my library isn't even scheduled at the moment), but they're quite correct (proper unit tests are in the works).
I believe that Mathias Gaunard is working on a library at <http://blogloufoque.free.fr/unicode/doc/html/>. I don't know how complete it is, but from the documentation it looks well thought-out so far. I'm looking forward to seeing where that's going!
Thanks! I'm in the writing of several tutorials to make it easier to understand how it's designed. (plus I still need to actually implement some stuff that is in that version of the docs)