On 6/23/2010 8:04 PM, John Dlugosz wrote:
and then you can stuff it in and recover it from exceptions just fine. The only issue is how boost::diagnostic_information (which returns a std::string) will display a wfile_name, and I'm sure whatever it does now isn't correct.
So this isn't a trivial problem. Perhaps the correct thing to do is document that boost::diagnostic_information returns a UTF-8 string, I kind of prefer this to the other possibility, to add a boost::wdiagnostic_information.
The Standard Library supplied with Visual C++ doesn't work with a UTF-8 "locale", and some versions give an error if you try to set that. The mblen string stuff in the source I've read is all designed around single byte or double byte characters with discernable prefixes, which works with the shift-JIS and other system code pages, but NOT with UTF-8.
If UTF-8 just contains Ansi characters, Visual C++' standard library should work with UTF-8 as just plain Ansi.
System functions take the "system code page" which might be different for file-name related functions, but did not support UTF-8 in the historical Windows line, but appears to be there for modern versions. But, lots of code was written to support Windows 95 and is still out there. Actually, I don't know if passing UTF-8 to the Windows API-A functions work! Normally, one uses the UTF-16 (-W) forms.
Again if the UTF-8 characters are just Ansi characters the Windows API-A functions should work with a UTF-8 C string.
Meanwhile, console output uses a different code page, and handles UTF-8 only if you set it up that way and supply a different font. That makes the regular shell stuff go funny though since it translates file names and such to use the "file name" code page mentioned earlier.
So, making the human-reportable string be UTF-8 is simply not going to sit well with Windows programmers. Make it UTF-16 and I can pass it to wcout<< or call TextOut, OutputDebugString, etc. with no problem.
Wasn't someone working on a Boost Unicode library which could convert a UTF-8 stream to its equivalent UTF-16 stream ?