Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8 codecvt facet

9 Oct 2015

      ...
To be honest I don't know what guys who designed <codecvt> in first place
It was done in the early and mid 1990's, with primary input coming from
Asian national bodies and the now long gone Unix vendors who had a big
presence in that market.

thought of - I feel string influence of broken MS Unicode policies
...
This was years before Microsoft folks started to participate in the LWG.
...
So I'm not going to implement C++11 <codecvt> because IMHO it is broken by
design in first
place.
Header <codecvt> isn't what we need, as you point out below.
...
Boost.Locale provides one but currently it is deep internal and complex
part of library.
The code I written for Boost.Nowide or one I suggest to put into
Boost.Locale header-only part
is codecvt that converts between utf8 and utf-16/32 according to size of
character:
boost::(nowide|or locale)::utf8_facet<wchar_t> - utf-8 to utf-16 (windows)
utf-32 (posix)
Don't forget utf-8 to utf-8 (some embedded systems).

IMO, a critical aspect of all of those, including utf-8 to utf-8, is that
they detect all utf-8 errors since ill-formed utf-8 is used as an attack
vector.

See Markus Kuhn's
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

I can contribute a Boost regression test friendly version of Kuhn's
malformed tests.

--Beman

Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8 codecvt facet

Beman Dawes