[Locale] New utf-8 codecvt facet in master (replacement of boost/details/utf8_codect_facet)
Hello, Following previous discussion regarding utf-8 facet in boost. I merged the changes to master branch. New utf8 codecvt facet that properly handles both UTF-16 and UTF-32 encoding for wchar_t (or char16_t/char32_t) is there. The major goal is to replace existing broken (*) utf8 facet existing today in boost/details/utf8_codecvt_facet.hpp/ipp It is implemented in header only so all you need is to include #include <boost/locale/utf8_facet.hpp> And install it as usual: std::locale new_locale(std::locale(),new boost::locale::utf8_codecvt<wchar_t>()); It *does not require* a separate compilation part like the one in details Note it is implemented in terms of boost::locale::generic_codecvt template<typename CharType,typename CodecvtImpl,int CharSize=sizeof(CharType)> class generic_codecvt; That has non-trivial specialization for CharSize=2 and CharSize=4 for UTF-16 and UTF-32 wchar_t/char16_t/char32_t character handling. boost::locale::generic_codecvt provides an interface for creating a range of facets for various character encodings. For example boost.locale uses it to implement various facets: - utf8 codecvt - single byte character set like ISO-8859-* or Windows-125* - wrap ICU ucnv_* and POSIX iconv APIs to create standard codecvt facet. That is why I decided to keep the implementation withing Boost.Locale library as the place that actually deals with different encoding. ----------- Once boost 1.60 will be released I encourage every library maintainer that incorporates broken boost/details/utf8_codecvt_facet.*pp to replace one with proper one from boost.locale Note: it is HEADER ONLY part and does not require any part of compiled library. Artyom Beilis (*) Current implementation does not handle utf-16 properly and can actually produce invalid utf-8
participants (1)
-
Artyom Beilis