Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows

28 Oct 2011

      On Fri, Oct 28, 2011 at 04:23, Alf P. Steinbach <
alf.p.steinbach+usenet@gmail.com> wrote:
...
On 27.10.2011 23:56, Peter Dimov wrote:
...
Alf P. Steinbach wrote:
...
On 27.10.2011 21:07, Peter Dimov wrote:
...
Alf P. Steinbach wrote:
...
...
...
...
Right, that's one reason why modern Windows programs should best be
wchar_t based.
This is one of the two options. The other is using UTF-8 for
representing paths as narrow strings. The first option is more natural
for Windows-only code, and the second is better, in practice, for
portable code because it avoids the need to duplicate all path-related
functions for char/wchar_t. The motivation for using UTF-8 is
practical,
not political or religious.
Thanks for that clarification of the current thinking at Boost.
My opinion is not representative of all of Boost, although I've found
that there is substantial agreement between people who write portable
software that needs to deal with paths (#2, UTF-8, as the way to go).
3. the most natural sufficiently general native encoding, 1 or 2
...
depending on the platform that the source is being built for.
Yes, with its various suboptions. 3a, TCHAR, 3b, template on char_type,
3c, providing both char and wchar_t overloads. They all have their
problems; people don't move to UTF-8 merely out of spite.
Prior art in this direction, includes Microsoft's [tchar.h].
...
This works, more or less, once you've accumulated the appropriate
library of _T macros, _t functions and T/t typedefs. I've never heard of
it actually being used for a portable code base,
[tchar.h], plus the similar support in <windows.h>, was heavily used for
porting applications between Windows 9x ANSI and Windows NT Unicode, before
Microsoft introduced the Layer for Unicode in 2001 or thereabouts (the layer
allowed wchar_t-apps to run in Windows 9x).
I'm not saying it's a good C++ approach for that porting  --  it's not,
since it was designed for the C language.
I just gave it as an example of prior art, which includes a neat header
where the names of the relevant functions to wrap (or whatever) can be
extracted by a small Python script. ;-)
but I admit that it's
...
possible to do things this way, even if it's somewhat alien to POSIX
people.
The advantage of using UTF-8 is that, apart from the border layer that
calls the OS (and that needs to be ported either way), the rest of the
code is happily char[]-based.
Oh.
I would be happy to learn this.
How do I make the following program work with Visual C++ in Windows, using
narrow character string?
<code>
#include <stdio.h>
#include <fcntl.h>      // _O_U8TEXT
#include <io.h>         // _setmode, _fileno
#include <windows.h>
int main()
{
   //SetConsoleOutputCP( 65001 );
   //_setmode( _fileno( stdout ), _O_U8TEXT );
   printf( "Blåbærsyltetøy! 日本国 кошка!\n" );
}
</code>
How will you make this program portable?

The out-commented code is from my random efforts to Make It Work(TM).
...
It refused.
This is because windows narrow-chars can't be UTF-8. You could make it
portable by:

int main()
{
    boost::printf("Blåbærsyltetøy! 日本国 кошка!\n");
}
...
By the way, I'm hoping Boost isn't supporting old versions of g++.
Because old versions of g++ chocked on a BOM at start of UTF-8 encoded
source code, while Visual C++ requires that BOM... So, UTF-8 source code
ungood with old versions of g++, if Visual C++ is also used.
If you don't use widechars, you can cheat VC++ to use UTF-8 string-literals.
Just save the file as UTF-8 *without* BOM. It will just embed them verbatim
into the executable.

 There's no need to be aware of the fact
...
...
that literals need to be quoted or that strlen should be spelled
_tcslen. There's no need to convert paths to an external representation
when writing them into a portable config/project file.
Hm, I'm not so sure.
I'd like to see this magic in action before believing in it, e.g., the
program above working with narrow chars and printf, with Visual C++.
See above and see
http://permalink.gmane.org/gmane.comp.lib.boost.devel/225036
...
That's an unrelated issue, really, but I think Boost could use a "get
...
...
undamaged program arguments in portable strings" thing, if it isn't
there already?
We'll be back to the question of what constitutes a portable string. I'd
prefer UTF-8 on Windows and whatever was passed on POSIX. You'd prefer
TCHAR[].
No, not TCHAR, which was designed for the C language (and is an ugly
uppercase name to boot).
Instead, like this:
<code>
#include "u/stdio_h.h"      // u::CodingValue, u::sprintf, U
#undef UNICODE
#define UNICODE
#include <windows.h>        // MessageBox
int main()
{
   u::CodingValue  buffer[80];
sprintf( buffer, U( "The answer is %d!" ), 6*7 );  // Koenig lookup.
   MessageBox(
       0,
       buffer->rawPtr(),
       U( "This is a title!" )->rawPtr(),
       MB_ICONINFORMATION | MB_SETFOREGROUND
       );
}
</code>
You judge from a non-portable coed point-of-view. How about:

#inclued <cstdio>
#include "gtkext/message_box.h" // for gtkext::message_box

int main()
{
    char buffer[80];
    sprintf(buffer, "The answer is %d!", 6*7);
    gtkext::message_box(buffer, "This is a title!", gtkext::icon_blah_blah,
...);
}

And unlike your code, it's magically portable! (thanks to gtk using UTF-8 on
windows)

Sincerely,
-- 
Yakov

Re: [boost] Silly Boost.Locale default narrow stringencodinginWindows

Yakov Galka