Hello,
Following previous discussion regarding utf-8 facet in boost.
I merged the changes to master branch. New utf8 codecvt facet that properly handles
both UTF-16 and UTF-32 encoding for wchar_t (or char16_t/char32_t) is there.
The major goal is to replace existing broken (*) utf8 facet existing today in
boost/details/utf8_codecvt_facet.hpp/ipp
It is implemented in header only so all you need is to include
#include
And install it as usual:
std::locale new_locale(std::locale(),new boost::locale::utf8_codecvt());
It *does not require* a separate compilation part like the one in details
Note it is implemented in terms of boost::locale::generic_codecvt
template
class generic_codecvt;
That has non-trivial specialization for CharSize=2 and CharSize=4 for UTF-16
and UTF-32 wchar_t/char16_t/char32_t character handling.
boost::locale::generic_codecvt provides an interface for creating a range
of facets for various character encodings. For example boost.locale
uses it to implement various facets:
- utf8 codecvt
- single byte character set like ISO-8859-* or Windows-125*
- wrap ICU ucnv_* and POSIX iconv APIs to create standard codecvt facet.
That is why I decided to keep the implementation withing Boost.Locale library
as the place that actually deals with different encoding.
-----------
Once boost 1.60 will be released I encourage every library maintainer
that incorporates broken boost/details/utf8_codecvt_facet.*pp to replace
one with proper one from boost.locale
Note: it is HEADER ONLY part and does not require any part of compiled
library.
Artyom Beilis
(*) Current implementation does not handle utf-16 properly and can actually produce
invalid utf-8