----- Original Message -----
From: Peter Dimov <lists@pdimov.com> Artyom Beilis wrote:
We can create a "Separate" codecvt library with its own formal review and it would be ready in best case in a year...
One option is to put it into utility; another is to use a mini-review if the new codecvt library is an implementation of the standard <codecvt> interface.
std::codecvt_utf8 is not quite the same as boost::utf8_codecvt_facet, but on the other hand, from your previous message it seems that your utf8_codecvt_facet is not std::codecvt_utf8 but std::codecvt_utf8_utf16, or perhaps it's the latter when wchar_t is 16 bit and the former when it's 32 bit.
[BEGIN: Long description regarding <codecvt> ] To be honest I don't know what guys who designed <codecvt> in first place thought of - I feel string influence of broken MS Unicode policies std::codecvt_utf8 is actually quite misleading - it converts between utf8 and ucs-2/ucs-4 i.e. using it under windows with wchar_t you wouldn't get support of utf-16 at all. It basically does what boost::XXX:utf8_codecvt_facet does for std::codecvt_utf8<wchar_t>. Basically broken and useless as UCS-2 is subset of proper encoding. Now <codecvt>'s std::codecvt_mode is clear Microsoftism as for example using UTF-8 BOM is one of the many Unicode crimes Microsoft created - as storing UTF-16 files on disk. Another hilarious stuff is Maxcode = 0x10ffff template parameter for the facet... It is like creating template<double Pi_Value=3.14159> class circle; 0x10FFFF IS max value for Unicode codepoint, not 0xFFFF not anything else. std::codecvt_utf16 is an attempt to build "narrow" utf-16 encoding, just no comment... std::codecvt_utf8_utf16 is actually useful under windows and does what it is supposed to to with wchar_t... but under POSIX platform it is impossible to use std::codecvt_utf8_utf16 with wchar_t because wchar_t is UTF-32... So if you want to install utf8 to wchar_t codecvt facet that represents utf-16 or utf-32 according to platform you need to use if(sizeof(wchar_t) == 2) return new std::codecvt_utf8_utf16<wchar_t>(); else // sizeof(wchar_t) == 4 return new std::codecvt_utf8<wchar_t>(); So all <codecvt> was built wrong under strong Microsoft development policy influences and useless for any cross platform development. So... Boost community - please give yourself a favor Don't use <codecvt> unless you really understand what are you doing. [END: Long description regarding <codecvt> ] If you want to covert utf8 files properly to native wide character like for example for boost::filesystem, boost::serialization or std::fstream you need to use facet that converts to utf-16 or utf-32 according to what wchar_t holds and <codecvt> does not provide one (without platform specific tricks) So I'm not going to implement C++11 <codecvt> because IMHO it is broken by design in first place. Boost.Locale provides one but currently it is deep internal and complex part of library. The code I written for Boost.Nowide or one I suggest to put into Boost.Locale header-only part is codecvt that converts between utf8 and utf-16/32 according to size of character: boost::(nowide|or locale)::utf8_facet<wchar_t> - utf-8 to utf-16 (windows) utf-32 (posix) boost::(nowide|or locale)::utf8_facet<char16_t> - utf-8 to utf-16 on any platform boost::(nowide|or locale)::utf8_facet<char32_t> - utf-8 to utf-32 on any platform That's it. It isn't <codecvt> because C++11 <codecvt> does not actually do the job needed. Artyom Beilis