
On 26.10.2011 22:13, Beman Dawes wrote:
On Tue, Oct 25, 2011 at 9:41 AM, Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com> wrote:
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
Yes, although it is actually characters that are not covered by the current file system codepage rather than non-ANSI characters, IIRC. Surprisingly, no one has opened a ticket yet.
Until someone does open a ticket and the problem gets fixed, there are a couple of workarounds:
(1) Use V2. Its fstream.hpp uses an implementation hack that works as long as 8.3 filenames are enabled.
I think this is good. :-) It's what I, unaware of the history, proposed.
(Some Windows users disable 8.3 filenames as an optimization.)
The capability to disable them is there, but I don't think anyone is actually doing that. Because: Windows uses 8.3 filenames in the registry, and reportedly the Microsoft Installer uses and requires them, and so on.
(2) V3 may work OK with the Microsoft 65001 UTF-8 codepage, although I've never used it myself and you would have to pass in a UTF-8 encoded narrow character name.
I'm not sure exactly what you're thinking of here, but I suspect that it's due to some technical misunderstanding. Narrow character Windows paths need to be encoded as ANSI, which is not a specific codepage but the variation of codepage 1252 specified by the GetACP function. This codepage is independent of the active codepage in a console; the default codepage for a console is called the "OEM" codepage. Changing the ANSI or OEM codepage, the default codepages, can be done via an undocumented registry key, and rebooting. However, while I regularly recommend changing the OEM codepage (from 437 to e.g. 1252), changing the ANSI codepage to something non-ANSI could conceivably wreak a lot of havoc with applications that assume that the ANSI codepage is like ANSI, a single byte per char encoding.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
Right. libstdc++ doesn't provide the wchar_t overloads.
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
I wrote up a technical solution in section 5, starting on page 16, of that work-in-progress document, available on Google Docs at:
Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code.
I do not know what to do with this.
If you care enough to open a ticket on the Boost bug tracker, I'll move the V2 code to V3. But there is a big backlog of tickets, so no guarantees as to when that will happen.
Thank you, done. <url: https://svn.boost.org/trac/boost/ticket/6065>
Another possibility is to try to talk the libstdc++ folks into supporting the Dinkumware wchar_t extension. They will presumably want to do that anyhow to support TR2 (or whatever it is going to be called).
Luc Danton, over at SO, pointed me to some earlier discussion of extending libstdc++ with Unicode path support, in June this year, at <url: http://gcc.gnu.org/ml/libstdc++/2011-06/msg00066.html>. Maybe that can be useful? Cheers, - Alf