Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet

9 Oct 2015


      ...
Andrey Semashev wrote:
...
...
WTF-8 and CESU-8 are not UTF-8 but different encodings. Dealing with them 
 should be the user's explicit choice (e.g. the user should write 
 utf16_to_wtf8 instead of utf16_to_utf8).
In addition to what I wrote earlier, the choices here are not representable 
in a single U or W letter. When taking UTF-8, you need to decide whether to
- accept codepoints over 10FFFF
- accept codepoints encoded with more bytes than necessary
- accept surrogates
No... all this isn't UTF-8. Period. Codepoints above 10FFFF is like lets assume Pi=3.15..

That is why the C++11 <codecvt> has basic design flaws. (See notes in previous e-mails)
...
- probably more because Unicode is hard
Unicode isn't hard - it is just treated with ignorance by even big
organization not talking about average programmers.

Artyom

Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8codecvt facet

Artyom Beilis