Re: [boost] Silly Boost.Locale default narrow string encoding inWindows

27 Oct 2011

      On 27.10.2011 20:01, Peter Dimov wrote:
...
Alf P. Steinbach wrote:
...
On 27.10.2011 18:47, Peter Dimov wrote:
...
Alf P. Steinbach wrote:
...
However, I still ask:
why FORCE INEFFICIENCY & AWKWARDNESS on Boost users -- why not just do
it right, using the platforms' native encodings.
Comment out the imbue line.
But that line is much of the point, isn't it?
There wouldn't be much point in calling imbue if you didn't want a
change in the boost::filesystem default behavior, which is to convert
using the ANSI CP (or the OEM CP if AreFIleApisAnsi() returns false, if
I'm not mistaken).
Oh there is.

It is a level of indirection.

You want Boost.Filesystem to assume /the same/ narrow character encoding 
as Boost.Locale, whatever it is.

And to quote the docs where I found that program,

"Boost Locale fully supports both narrow and wide API. The default 
character encoding is assumed to be UTF-8 on Windows."
...
...
...
(The platform's native encoding is UTF-16. The "ANSI" code page, which
is not necessarily ANSI or ANSI-like at all, despite your assertion,
The article you responded to did not contain the word "ANSI".
Thus, when you refer to an assertion about "ANSI", you have fantasized
something.
http://boost.2283326.n4.nabble.com/Making-Boost-Filesystem-work-with-GENERAL...
That's a different context and a different discussion, where it was 
neither necessary nor natural to dot the i's and cross the t's to 
perfection.

Talk about dragging in things from out of the blue.

If you wanted to point out the possibility of e.g. a Japanese codepage 
as ANSI, then you should have done that over there, in that thread. I 
mean in the context where it could make sense and where it could help 
prevent readers getting a wrong impression. If it was that important.

[snippety]
...
Under Windows (NT+ and NTFS), the narrow character API is a wrapper over
the wide character API. The system converts from/to the ANSI code page
as needed. The narrowing conversion may lose data.
OK, we're just talking about two different meanings of "native", for two 
different contexts: windows internals, and windows apps.

The relevant context for discussing Boost.Locale's treatment of narrow 
strings, is the application level.
...
...
...
[the program] will work fine until it's given a file name that is not
representable in the ANSI CP.)
Nope, sorry, for any /reasonable interpretation/ of what you're writing.
File names on NTFS are not necessarily representable in the ANSI code
page. A program that uses narrow strings in the ANSI code page to
represents paths will not necessarily be able to open all files on the
system.
Right, that's one reason why modern Windows programs should best be 
wchar_t based. Other reasons include efficiency (avoiding conversions) 
and simple convenience. Some API functions do not have narrow wrappers.

However, a default assumption of UTF-8 encoding for narrow strings, as 
in Boost.Locale, seems to me to clash with most uses of narrow strings.

For example, if you output UTF-8 on standard output, and then try to 
pipe that through `more` in Windows' [cmd.exe], you get this:

<example>
d:\dave> chcp 65001
Active code page: 65001

d:\dave> echo "imagine this is utf8" | more
Not enough memory.

d:\dave> _
</example>

So utf-8 is, to put it less than strongly, not very practical as a 
general narrow-character encoding in Windows.

The example that I gave at top of the thread was passing a `main` 
argument further on, when using Boost.Locale. It causes trouble because 
in Windows `main` arguments are by convention encoded as ANSI, while 
Boost.Locale has UTF-8 as default. Treating ANSI as UTF-8 generally 
yields gobbledygook, except for the pure ASCII common subset.

But with ANSI as Boost.Locale default, with that more reasonable choice 
of default, the imbue call would not cause trouble, but would instead 
help to avoid trouble  --  which is surely the original intention.

Cheers & hth.,

- Alf