
On 28.10.2011 12:36, Yakov Galka wrote:
On Fri, Oct 28, 2011 at 04:23, Alf P. Steinbach< alf.p.steinbach+usenet@gmail.com> wrote:
On 27.10.2011 23:56, Peter Dimov wrote:
The advantage of using UTF-8 is that, apart from the border layer that calls the OS (and that needs to be ported either way), the rest of the code is happily char[]-based.
Oh.
I would be happy to learn this.
How do I make the following program work with Visual C++ in Windows, using narrow character string?
<code> #include<stdio.h> #include<fcntl.h> // _O_U8TEXT #include<io.h> // _setmode, _fileno #include<windows.h>
int main() { //SetConsoleOutputCP( 65001 ); //_setmode( _fileno( stdout ), _O_U8TEXT ); printf( "Blåbærsyltetøy! 日本国 кошка!\n" ); } </code>
How will you make this program portable?
Well, that was *my* question. The claim that this minimal "Hello, world!" program puts to the point, is that "the rest of the [UTF-8 based] code is happily char[]-based". Apparently that is not so.
The out-commented code is from my random efforts to Make It Work(TM).
It refused.
This is because windows narrow-chars can't be UTF-8. You could make it portable by:
int main() { boost::printf("Blåbærsyltetøy! 日本国 кошка!\n"); }
Thanks, TIL boost::printf. The idea of UTF-8 as a universal encoding seems now to be to use some workaround such as boost::printf for each and every case where it turns out that it doesn't work portably. When every portability problem has been diagnosed and special cased to use functions that translate to/from UTF-8 translation, and ignoring the efficiency aspect of that, then UTF-8 just magically works, hurray. E.g., if 'fopen( "rød.txt", "r" )' fails in the universal UTF-8 code, then just replace with 'boost::fopen', or 'my_special_casing::fopen'. However, with these workaround details made manifest, it is /much less/ convincing than the original general vague claim that UTF-8 just works. [snip]
You judge from a non-portable coed point-of-view. How about:
#include <cstdio> #include "gtkext/message_box.h" // for gtkext::message_box
int main() { char buffer[80]; sprintf(buffer, "The answer is %d!", 6*7); gtkext::message_box(buffer, "This is a title!", gtkext::icon_blah_blah, ...); }
And unlike your code, it's magically portable! (thanks to gtk using UTF-8 on windows)
Aha. When you use a library L that translates in platform-specific ways to/from UTF-8 for you, then UTF-8 is magically portable. For use of L. However, try to pass a `main` argument over to gtkext::message_box. Then you have involved some /ohter code/ (namely the runtime library code that calls 'main') that may not necessarily translate for you, and in fact in Windows is extremely unlikely to translate for you. Such code is prevalent. Most code does not translate to/from UTF-8. Cheers & hth., & thanks for mention of boost::printf, - Alf PS: With C++11 there is no longer any reason to use <cstdio> instead of <stdio.h>, because <cstdio> no longer formally guarantees to not pollute the global namespace (and in practice it has never honored its C++98 guarantee). The code above is a good example why <stdio.h> is preferable -- it is too easy to write non-portable code with <cstdio>, such as using unqualified sprintf (not to mention size_t!).