
Rogier van Dalen wrote:
An assumption I think is wrong is that wchar_t would be suitable for Unicode. Correct me if I'm wrong, but IIRC wchar_t has 16 bits on Microsoft compilers, for example. The utf8_codecvt_facet implementation will on these compilers cut off any codepoints over 0xFFFF. (U+1D12C will come out as U+D12C.)
This is because the Windows NT ABI is hardwired for 16-bit wide characters. I beleive that means the wide characters are actually UTF-16 characters that use "surrogate pairs." Regardless of whether this is a good thing or not, Windows compilers need to follow suit as the underlying implementation of their wide characters is in Windows, not in the compiler. It might be possible for a compiler to provide their own Unicode implementation, and map that to Windows' wide characters, but in the user-visible situations where the two implementations disagreed, there might be suprising results that might make the compiler-provided implementation unusable. Aaron W. LaFramboise