
Hi Tilman,
I'm jumping in, because I am interested in Unicode conversion facets...
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I had a look at the serialization library's converter in utf8_codecvt_facet.cpp and noticed that utf8_codecvt_facet_wchar_t::do_in() doesn't check for non-shortest UTF8-sequences.
Hmmm... I think it's just an omission, and it would be easy to add.
There might also be some issues on platforms with 16-bit wchar_t (possible overflow).
I suggest using (parts of) the UTF library in the Boost files area to solve those problems. This could also be another step towards an officially supported Unicode library... ;-)
While I think that library is OK, and last time the author, Alberto Barbati, posted on this, he knew about Unicode much more than I, I don't think it's good to take that library and add it now to details. Simply put, it will take another week until regression tests turn green again. I also don't think there's particular difference between different utf8 implementations.... - Volodya