
On 28.10.2011 14:41, Yakov Galka wrote:
On Fri, Oct 28, 2011 at 13:58, Peter Dimov<pdimov@pdimov.com> wrote:
Alf P. Steinbach wrote:
How do I make the following program work with Visual C++ in Windows, using
narrow character string?
<code> #include<stdio.h> #include<fcntl.h> // _O_U8TEXT #include<io.h> // _setmode, _fileno #include<windows.h>
int main() { //SetConsoleOutputCP( 65001 ); //_setmode( _fileno( stdout ), _O_U8TEXT ); printf( "Blåbærsyltetøy! 日本国 кошка!\n" ); } </code>
Output to a console wasn't our topic so far (and is not one of my strong points), but the specific problem with this program is that the embedded literal is not UTF-8, as the warning C4566 tells us, so there is no way for you to get UTF-8 in the output. (You should be able to set VC++'s code page to 65001, but I don't think you can.)
int main() { printf( utf8_encode( L"кошка" ).c_str() ); }
You don't need to configure anything, in fact you cannot do it properly in VS. What you can do is:
1) don't use wide-char literals with non ascii characters 2) use UTF-8 literals for narrow-char.
All you need is to save the source as UTF-8 WITHOUT BOM. Works as charm on VS2005 and VS2010. Apparently it's portable. The IDE can detect UTF-8 even without BOM ("☑ Auto-detect UTF-8 encoding without signature").
This is interesting in a perverse sort of way. In order to make Visual C++ produce UTF-8 encoded compiled narrow strings, one must /lie/ to the compiler. The source code is UTF-8. And one lies and tells the Visual C++ compiler that it's ANSI. And in order to make g++ produce ANSI encoded compiled narrow strings, one must /lie/ the compiler. The source code is ANSI. And one lies and tells the g++ compiler that it's UTF-8. As I see it, there's something wrong here. Notwithstanding the limitation that codepage 65000 is impractical in the Windows command interpreter -- e.g. 'more' command CRASHES.
This is not a practical problem for "proper" applications because Russian text literals should always come from the equivalent of gettext and never be embedded in code.
+1
I find that a very narrow minded view. Would you like to be the one telling Norwegian student Åshild Bjørnson that you favor the notion that she should waste hours or days installing Boost and some other nix-oriented library and use 'gettext', in order to be able to display her name in her first C++ program? That text representation and output in C++ has been designed (with your not just willing but enthusiastic vote) to be so inherently complex that it requires hours and days of efforts just to display your name?
Personally I'm happy with
printf( "Blåbærsyltetøy! 日本国 кошка!\n" );
writing UTF-8. Even if I cannot configure the console, I still can redirect it to a file, and it will correctly save this as UTF-8. Preventing data-loss is more important for me.
I find it thoroughly disgusting to have to lie to your tools, and to rely on an assumption that the tools will not wisen up in the future. However, I concede the point that IF one is happy with output that's encoded so that most Windows command line tools fail (e.g. `more` crashes), and IF one is happy with lying to the compiler about the source encoding, and IF one is happy assuming that the compiler won't wisen up about encodings in a future version, then -- the UTF-8 scheme allows literals with national language characters, not just A through Z. However, those are pretty constricting conditions. Cheers & hth., - Alf