Andrey Semashev wrote:
The right way to not deal with these issues is to simply not take wide strings in the first place. This forces the user to supply "the canonical octet representation".
Since we do take wide strings, we have implicitly accepted the responsibility to produce the canonical octet representation for them. And inserting zeroes randomly is simply wrong.
Ok, so maybe we should simply deprecate the support for wide string inputs?
That's one possible way to deal with it, yes. Although I think that for char16_t and char32_t inputs the canonical representation is unambiguous. This leaves wchar_t and while nobody on POSIX will shed a tear, Windows users will probably be disappointed if we take that away. That's why I thought that treating wchar_t as char16_t or char32_t depending on size was an acceptable compromise. (That's almost always true in practice, with the exception of weird IBM systems that use wide EBCDIC which aren't exactly our target audience.)