
On Fri, Oct 28, 2011 at 04:23, Alf P. Steinbach < alf.p.steinbach+usenet@gmail.com> wrote:
On 27.10.2011 23:56, Peter Dimov wrote:
Alf P. Steinbach wrote:
On 27.10.2011 21:07, Peter Dimov wrote:
Alf P. Steinbach wrote:
...
Right, that's one reason why modern Windows programs should best be wchar_t based.
This is one of the two options. The other is using UTF-8 for representing paths as narrow strings. The first option is more natural for Windows-only code, and the second is better, in practice, for portable code because it avoids the need to duplicate all path-related functions for char/wchar_t. The motivation for using UTF-8 is practical, not political or religious.
Thanks for that clarification of the current thinking at Boost.
My opinion is not representative of all of Boost, although I've found that there is substantial agreement between people who write portable software that needs to deal with paths (#2, UTF-8, as the way to go).
3. the most natural sufficiently general native encoding, 1 or 2
depending on the platform that the source is being built for.
Yes, with its various suboptions. 3a, TCHAR, 3b, template on char_type, 3c, providing both char and wchar_t overloads. They all have their problems; people don't move to UTF-8 merely out of spite.
Prior art in this direction, includes Microsoft's [tchar.h].
This works, more or less, once you've accumulated the appropriate library of _T macros, _t functions and T/t typedefs. I've never heard of it actually being used for a portable code base,
[tchar.h], plus the similar support in <windows.h>, was heavily used for porting applications between Windows 9x ANSI and Windows NT Unicode, before Microsoft introduced the Layer for Unicode in 2001 or thereabouts (the layer allowed wchar_t-apps to run in Windows 9x).
I'm not saying it's a good C++ approach for that porting -- it's not, since it was designed for the C language.
I just gave it as an example of prior art, which includes a neat header where the names of the relevant functions to wrap (or whatever) can be extracted by a small Python script. ;-)
but I admit that it's
possible to do things this way, even if it's somewhat alien to POSIX people.
The advantage of using UTF-8 is that, apart from the border layer that calls the OS (and that needs to be ported either way), the rest of the code is happily char[]-based.
Oh.
I would be happy to learn this.
How do I make the following program work with Visual C++ in Windows, using narrow character string?
<code> #include <stdio.h> #include <fcntl.h> // _O_U8TEXT #include <io.h> // _setmode, _fileno #include <windows.h>
int main() { //SetConsoleOutputCP( 65001 ); //_setmode( _fileno( stdout ), _O_U8TEXT ); printf( "Blåbærsyltetøy! 日本国 кошка!\n" ); } </code>
How will you make this program portable? The out-commented code is from my random efforts to Make It Work(TM).
It refused.
This is because windows narrow-chars can't be UTF-8. You could make it portable by: int main() { boost::printf("Blåbærsyltetøy! 日本国 кошка!\n"); }
By the way, I'm hoping Boost isn't supporting old versions of g++.
Because old versions of g++ chocked on a BOM at start of UTF-8 encoded source code, while Visual C++ requires that BOM... So, UTF-8 source code ungood with old versions of g++, if Visual C++ is also used.
If you don't use widechars, you can cheat VC++ to use UTF-8 string-literals. Just save the file as UTF-8 *without* BOM. It will just embed them verbatim into the executable. There's no need to be aware of the fact
that literals need to be quoted or that strlen should be spelled _tcslen. There's no need to convert paths to an external representation when writing them into a portable config/project file.
Hm, I'm not so sure.
I'd like to see this magic in action before believing in it, e.g., the program above working with narrow chars and printf, with Visual C++.
See above and see http://permalink.gmane.org/gmane.comp.lib.boost.devel/225036
That's an unrelated issue, really, but I think Boost could use a "get
undamaged program arguments in portable strings" thing, if it isn't there already?
We'll be back to the question of what constitutes a portable string. I'd prefer UTF-8 on Windows and whatever was passed on POSIX. You'd prefer TCHAR[].
No, not TCHAR, which was designed for the C language (and is an ugly uppercase name to boot).
Instead, like this:
<code> #include "u/stdio_h.h" // u::CodingValue, u::sprintf, U
#undef UNICODE #define UNICODE #include <windows.h> // MessageBox
int main() { u::CodingValue buffer[80];
sprintf( buffer, U( "The answer is %d!" ), 6*7 ); // Koenig lookup. MessageBox( 0, buffer->rawPtr(), U( "This is a title!" )->rawPtr(), MB_ICONINFORMATION | MB_SETFOREGROUND ); } </code>
You judge from a non-portable coed point-of-view. How about: #inclued <cstdio> #include "gtkext/message_box.h" // for gtkext::message_box int main() { char buffer[80]; sprintf(buffer, "The answer is %d!", 6*7); gtkext::message_box(buffer, "This is a title!", gtkext::icon_blah_blah, ...); } And unlike your code, it's magically portable! (thanks to gtk using UTF-8 on windows) Sincerely, -- Yakov