
Actually I want to mention that UTF-8 codecvt facet implementation has several other problems: 1. When sizeof(wchar_t)==2 it supports only UCS-2 and not full UTF-16 2. It is indeed does not strictly assumes that maximal encoding of single UTF-8 character is 4. In Boost.Locale I had implemented the full UTF-8 codecvt facet that supports both UTF-16 and UTF-32 I assume that this code can replace current implementation, even thou it should be extracted from Boost.Locale iw this facet is more generic and supoorts other encodings as well. Note, this UTF-8 facet does not depend on external library. Artyom
I've been meaning to mention this for some time. The boost utf-8 code conversion facet implements an early spec of utf-8 that allows up to 6 byte representations but current specs, and security issues suggest it should only support up to four. See http://en.wikipedia.org/wiki/UTF-8 and in particular the section on invalid byte sequences. It also has some stuff wrong, like do_length() is supposed to only tell you length of valid code sequences, but the boost implementation doesn't check for validity.