Re: [boost] Silly Boost.Locale default narrow string encoding in Windows

On Saturday, 29 October 2011, Peter Dimov wrote:
The "dir" command has no problem displaying arbitrary file names directly to the console (presumably via WriteConsoleW), but once it has to write to a file, it needs to convert to narrow and no code page other than 65001 can express the above file name.
This is not that relevant to the wider issue, but wide streams will work for console output if you first do this: if (_isatty(_fileno(stdout))) _setmode(_fileno(stdout), _O_U16TEXT); if (_isatty(_fileno(stderr))) _setmode(_fileno(stderr), _O_U16TEXT); i.e. set the output mode to UTF-16 when writing to the console. This only works for recent versions of Visual C++. Obviously doesn't fix piped output.

On 29.10.2011 18:23, Daniel James wrote:
On Saturday, 29 October 2011, Peter Dimov wrote:
The "dir" command has no problem displaying arbitrary file names directly to the console (presumably via WriteConsoleW), but once it has to write to a file, it needs to convert to narrow and no code page other than 65001 can express the above file name.
This is not that relevant to the wider issue, but wide streams will work for console output if you first do this:
if (_isatty(_fileno(stdout))) _setmode(_fileno(stdout), _O_U16TEXT); if (_isatty(_fileno(stderr))) _setmode(_fileno(stderr), _O_U16TEXT);
i.e. set the output mode to UTF-16 when writing to the console. This only works for recent versions of Visual C++. Obviously doesn't fix piped output.
Right. But the added 'if's produce another problem, namely that redirection to a file is prevented from working. <example> P:\test> chcp 65001 Active code page: 65001 P:\test> type jam.cpp #include <stdio.h> #include <io.h> // _setmode #include <fcntl.h> // _O_U8TEXT int main() { //_setmode( _fileno( stdout ), _O_U8TEXT ); if( _isatty( _fileno( stdout ) ) ) { _setmode( _fileno( stdout ), _O_U16TEXT ); } ::wprintf( L"Blåbærsyltetøy! 日本国 кошка!\n" ); } P:\test> cl jam.cpp jam.cpp P:\test> jam Blåbærsyltetøy! 日本国 кошка! P:\test> jam >x P:\test> type x Bl�b�rsyltet�y! P:\test> _ </example> Without the added 'if's, and instead adding a Unicode BOM to the start of the text, it works fine for redirection: <example P:\test> chcp 65001 Active code page: 65001 P:\test> type jam.cpp #include <stdio.h> #include <io.h> // _setmode #include <fcntl.h> // _O_U16TEXT int main() { _setmode( _fileno( stdout ), _O_U16TEXT ); ::wprintf( L"\uFEFF" L"Blåbærsyltetøy! 日本国 кошка!\n" ); } P:\test> cl jam.cpp jam.cpp jam.cpp(8) : warning C4428: universal-character-name encountered in source P:\test> jam Blåbærsyltetøy! 日本国 кошка! P:\test> jam >x P:\test> type x Blåbærsyltetøy! 日本国 кошка! P:\test> chcp 437 Active code page: 437 P:\test> type x Blåbærsyltetøy! 日本国 кошка! P:\test> _ </example> UTF-8 is even more forgiving as an external format. You don't see the BOM. Oh, I see that it's disappeared above, difficult to copy-paste, but it's there in the direct output as a rectangle. Cheers & hth., - Alf

Alf P. Steinbach wrote:
int main() { _setmode( _fileno( stdout ), _O_U16TEXT ); ::wprintf( L"\uFEFF" L"Blåbærsyltetøy! 日本国 кошка!\n" ); }
This produces an UTF-16 text file though. It works with "type", but would probably confuse most other programs. And more. C:\Projects\testbed>release\testbed.exe > testbed.txt C:\Projects\testbed>type testbed.txt Blåbærsyltetøy! 日本国 кошка! C:\Projects\testbed>type testbed.txt | more Blåbærsyltetoy! ??? ?????! C:\Projects\testbed>cat testbed.txt ▒▒B l ▒ b ▒ r s y l t e t ▒ y ! ▒e,g▒V :♦>♦H♦:♦0♦!
participants (3)
-
Alf P. Steinbach
-
Daniel James
-
Peter Dimov