-------------------------------------------- On Thu, 10/8/15, Peter Dimov <lists@pdimov.com> wrote: Subject: Re: [boost] [nowide] Library Updates and Boost's broken UTF-8 codecvt facet To: boost@lists.boost.org Date: Thursday, October 8, 2015, 10:14 PM
I agree that this makes the most sense. I only brought up <codecvt> because if we used the standard interface and names we wouldn't have needed a full review of the hypothetical libs/codecvt.
See... lots of stuff in standard library related to Unicode is broken. It wasn't fixed in C++11 and wouldn't be later. Also there is deep problem with Windows API that created Wide API and ignores any standard - both C and C++. i.e.. there are basic files that can't even be opened on Windows using plain C fopen or C++ std::fstream.
As this stands, libs/utility seems the best bet, although I'm not overly fond of the practice of putting everything that doesn't fit elsewhere into Utility. :-) But it's better than Detail because it's documented and tested. One could make the case for libs/utf8 which would contain utf8_facet and the "obvious"
bool is_valid_utf8( string const & s ); wstring utf8_decode( string const & s ); string utf8_encode( wstring const & s );
but this is already well into full review/bikeshed territory.
See, all this is already implemented in header only way in Boost.Locale - so no linking required. https://github.com/boostorg/locale/blob/master/include/boost/locale/utf.hpp https://github.com/boostorg/locale/blob/master/include/boost/locale/encoding... So just call boost::locale::conv::utf_to_utf<wchar_t>("Hello World"); Full codecvt_facet for many encodings - inluding UTF-8, ISO-8859-*, Windows-125* are already there as well However there is very useful specific codecvt - that converts between utf8 and wchar_t/char16_t/char32_t that can be implemented in header only without linking with big and complex Boost.Locale library. Also I'm going to make it little bit more generic so you can implement wchar_t/char16_t/char32_t to any stateless encoding easily (I want to improve some stuff withing Boost.Locale as well) So utf8 codecvt facet is INTEGRAL part of Boost.Locale already - it exists there. Just I think I'll make it more accessible to general libraries without requirement of linking and easiler to use by users without need of special locale generation. Ok... I decided what I'm going to do. Next step is for other libraries to adopt this utf8_codecvt facet. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.com/ CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/