----- Original Message -----
From: Peter Dimov
To: boost@lists.boost.org Cc: Sent: Friday, October 9, 2015 11:40 PM Subject: Re: [boost] [nowide] Library Updates and Boost'sbrokenUTF-8codecvt facet Artyom Beilis wrote:
Codepoints above 10FFFF is like lets assume Pi=3.15..
No, sorry. This is not at all the same. The reason we're in this mess is precisely because codepoints above 0xFFFF were like pi=3.15. And then it turned out they weren't.
Yeah but for UTF-16 it is over you can't go past it ;-)
- probably more because Unicode is hard
Unicode isn't hard - it is just treated with ignorance by even big organization not talking about average programmers.
What I meant by that is for instance
- is 0xCC 0x81 a valid UTF-8 string? - is 0x65 0xCC 0x81 0xCC 0x81 a valid UTF-8 string?
Both are valid strings.. and both are meaningless on their own i.e. accent without letter or two same accents. Being illogical in human terms or representation does not make them UTF-8 illegal. UTF-8 is simple, human language processing is complex. Artyom