Making Boost.Filesystem work with GENERAL filenames with g++ in Windows (a solution)

First, apologies if I'm posting to the wrong group/list; if so then please redirect me. IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE. The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions). I stumbled onto this while I was writing about using Unicode in C++ programming in Windows. I wrote up a technical solution in section 5, starting on page 16, of that work-in-progress document, available on Google Docs at: https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2oiI2reHOh4ZjdkNmUyNDctNzI1Yi00NmJjLThiMzgtYmI3OGE0ZmE5MDg2&hl=en Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code. I do not know what to do with this. But considering that Boost.Filesystem is slated for later inclusion in the C++ standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++, so that we don't end up with a file handling part of the standard library that can't handle files in general; hence this posting and plea for advice -- what more should I do, if anything? Cheers, - Alf

----- Original Message -----
From: Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com>
First, apologies if I'm posting to the wrong group/list; if so then please redirect me.
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
Why just not to implement boost::filesystem::fstream over _wfopen and custom streambuf implementation? It is relatively simple. I have implemented such thing under booster namespace as part of cppcms project http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/boos... http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/lib/... http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/booster/nowide/fstream.h?revision=1967&view=markup
I wrote up a technical solution in section 5, starting on page 16, of that work-in-progress document, available on Google Docs at:
Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code.
I do not know what to do with this.
That is what very good about stream buffers... You can implement anything you need. Using short file names is no go for two reasons: 1. It works only when file exists (can't create new file) 2. It is quite deprecated
But considering that Boost.Filesystem is slated for later inclusion in the C++ standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++,
This is a problem that can be fixed easily in Boost.Filesystem.
so that we don't end up with a file handling part of the standard library that can't handle files in general;
This is a "bug" in Windows operating system... but this is other story.
hence this posting and plea for advice -- what more should I do, if anything?
Write a patch that implements stream buffer over stdio and _wfopen?
Cheers,
- Alf
Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/>

On 26.10.2011 11:47, Artyom Beilis wrote:
----- Original Message -----
From: Alf P. Steinbach<alf.p.steinbach+usenet@gmail.com>
First, apologies if I'm posting to the wrong group/list; if so then please redirect me.
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
Why just not to implement boost::filesystem::fstream over _wfopen and custom streambuf implementation?
It is relatively simple.
I have implemented such thing under booster namespace as part of cppcms project
http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/boos... http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/lib/...
I wrote up a technical solution in section 5, starting on page 16, of that work-in-progress document, available on Google Docs at:
Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code.
I do not know what to do with this.
That is what very good about stream buffers... You can implement anything you need.
If the authors of Boost.Filesystem are happy to reimplement stream buffers from scratch, all the way, then that would indeed be good news. But do you *really* think that is realistic?
Using short file names is no go for two reasons:
1. It works only when file exists (can't create new file) 2. It is quite deprecated
Both points are incorrect. Regarding (1), did you notice that I wrote "full source code provided"...? I think when something demonstrably works, it's rather dumb to assert that it can't work, sorry.
But considering that Boost.Filesystem is slated for later inclusion in the C++ standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++,
This is a problem that can be fixed easily in Boost.Filesystem.
Yes, I think so, for Windows. However, more important, the existence of an actual fix (which I linked to) shows that the present interface does not prevent a fix for Windows. For some other OS-es it may not necessarily be easy to fix, however, and thus, for inclusion of Boost.Filesystem in the standard library I think that should be thoroughly investigated.
so that we don't end up with a file handling part of the standard library that can't handle files in general;
This is a "bug" in Windows operating system... but this is other story.
Sorry, that's bullshit: there is no such bug in the OS.
hence this posting and plea for advice -- what more should I do, if anything?
Write a patch that implements stream buffer over stdio and _wfopen?
If the authors of Boost.Filesystem are happy to reimplement stream buffers from scratch, all the way, then that would indeed be good news. But do you *really* think that is realistic? Cheers & hth., - Alf

On Wed, Oct 26, 2011 at 11:56, Alf P. Steinbach < alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 11:47, Artyom Beilis wrote:
----- Original Message -----
From: Alf P. Steinbach<alf.p.steinbach+**usenet@gmail.com<alf.p.steinbach%2Busenet@gmail.com>
First, apologies if I'm posting to the wrong group/list; if so then please redirect me.
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
Why just not to implement boost::filesystem::fstream over _wfopen and custom streambuf implementation?
It is relatively simple.
I have implemented such thing under booster namespace as part of cppcms project
http://cppcms.svn.sourceforge.**net/viewvc/cppcms/framework/** trunk/booster/booster/nowide/<http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/booster/nowide/> http://cppcms.svn.sourceforge.**net/viewvc/cppcms/framework/** trunk/booster/lib/nowide/src/<http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/lib/nowide/src/>
http://cppcms.svn.sourceforge.**net/viewvc/cppcms/framework/** trunk/booster/booster/nowide/**fstream.h?revision=1967&view=**markup<http://cppcms.svn.sourceforge.net/viewvc/cppcms/framework/trunk/booster/booster/nowide/fstream.h?revision=1967&view=markup>
I wrote up a technical solution in section 5, starting on page 16, of
that work-in-progress document, available on Google Docs at:
https://docs.google.com/**viewer?a=v&pid=explorer&**chrome=true&srcid=** 0B2oiI2reHOh4ZjdkNmUyNDctNzI1Y**i00NmJjLThiMzgtYmI3OGE0ZmE5MDg**2&hl=en<https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2oiI2reHOh4ZjdkNmUyNDctNzI1Yi00NmJjLThiMzgtYmI3OGE0ZmE5MDg2&hl=en>
Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code.
I do not know what to do with this.
That is what very good about stream buffers... You can implement anything you need.
If the authors of Boost.Filesystem are happy to reimplement stream buffers from scratch, all the way, then that would indeed be good news.
But do you *really* think that is realistic?
I think when someone gives you a working implementation, it's rather dumb to ask if it's realistic or not.
Using short file names is no go for two reasons:
1. It works only when file exists (can't create new file) 2. It is quite deprecated
Both points are incorrect.
Regarding (1), did you notice that I wrote "full source code provided"...?
Yes, you provide a hackish workaround, that is vulnerable to a race-condition btw. Even a more serious problem is that it doesn't work with paths longer than MAX_PATH.
I think when something demonstrably works, it's rather dumb to assert that it can't work, sorry.
But considering that Boost.Filesystem is slated for later inclusion in the
C++ standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++,
This is a problem that can be fixed easily in Boost.Filesystem.
Yes, I think so, for Windows.
However, more important, the existence of an actual fix (which I linked to) shows that the present interface does not prevent a fix for Windows.
For some other OS-es it may not necessarily be easy to fix, however, and thus, for inclusion of Boost.Filesystem in the standard library I think that should be thoroughly investigated.
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world. Even if you fix the Unicode problems, you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS. Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
so that we
don't end up with a file handling part of the standard library that can't handle files in general;
This is a "bug" in Windows operating system... but this is other story.
Sorry, that's bullshit: there is no such bug in the OS.
I think what Artyom means is that the "bug" in windows is the whole UTF-16 crap. -- Yakov

On 26.10.2011 12:24, Yakov Galka wrote:
On Wed, Oct 26, 2011 at 11:56, Alf P. Steinbach< alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 11:47, Artyom Beilis wrote:
That is what very good about stream buffers... You can implement anything you need.
If the authors of Boost.Filesystem are happy to reimplement stream buffers from scratch, all the way, then that would indeed be good news.
But do you *really* think that is realistic?
I think when someone gives you a working implementation, it's rather dumb to ask if it's realistic or not.
Let me try to clarify what I meant, then. Artyom's suggestion, as I understand it, is to forsake each compiler's streambuf implementation and use a Boost-provided general one that extends the standard's interface. With that Boost implementation building on some non-standard extension such as _wfopen, or implementing buffers from scratch, all the way. Which in the general case will be necessary, and thus, no point in also using extensions. Now is that realistic?
Using short file names is no go for two reasons:
1. It works only when file exists (can't create new file) 2. It is quite deprecated
Both points are incorrect.
Regarding (1), did you notice that I wrote "full source code provided"...?
Yes, you provide a hackish workaround, that is vulnerable to a race-condition btw.
No, AFAIK there is no extra realistic vulnerability. There is a vulnerability, yes, but it's there anyway. And there is a non-realistic (extremely low probability) vulnerability that process A attempts to create a non-existing file F, while process B tries to delete F. In this case creating F might fail /also with the fix/. But arguing that the fix is "hackish" because it fails to fix such an esoteric case, is IMO not serious.
Even a more serious problem is that it doesn't work with paths longer than MAX_PATH.
The vague impression you have in that direction is in a sense valid, that there is something fishy there, because with the fix one may conceivably have a problem that a path that is longer than MAX_PATH works, when by rights it should not (I have not tested this though). However, 1 It is trivial to restrict narrow character path lengths to MAX_PATH so that that problem does not occur. 2 It is only where the existing implementation fails to access a too long path, that the fix may fail to erroneously fix it. 3 It is not a good idea to use Windows paths longer than MAX_PATH since Windows Explorer and many other tools do not support them. Unless you want your application to create unusable files. In short your "serious problem" has it backwards as to what the problems are, and the fix does not introduce any restriction.
I think when something demonstrably works, it's rather dumb to assert that it can't work, sorry.
But considering that Boost.Filesystem is slated for later inclusion in the
C++ standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++,
This is a problem that can be fixed easily in Boost.Filesystem.
Yes, I think so, for Windows.
However, more important, the existence of an actual fix (which I linked to) shows that the present interface does not prevent a fix for Windows.
For some other OS-es it may not necessarily be easy to fix, however, and thus, for inclusion of Boost.Filesystem in the standard library I think that should be thoroughly investigated.
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world. Even if you fix the Unicode problems, you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS. Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
Hm, I haven't used the path class. However the MAX_PATH issue is not really an issue in practice. The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths. Cheers & hth., - Alf

On Wed, Oct 26, 2011 at 12:59, Alf P. Steinbach < alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 12:24, Yakov Galka wrote:
On Wed, Oct 26, 2011 at 11:56, Alf P. Steinbach< alf.p.steinbach+usenet@gmail.**com <alf.p.steinbach%2Busenet@gmail.com>> wrote:
On 26.10.2011 11:47, Artyom Beilis wrote:
That is what very good about stream buffers... You can implement anything you need.
If the authors of Boost.Filesystem are happy to reimplement stream buffers from scratch, all the way, then that would indeed be good news.
But do you *really* think that is realistic?
I think when someone gives you a working implementation, it's rather
dumb to ask if it's realistic or not.
Let me try to clarify what I meant, then.
Artyom's suggestion, as I understand it, is to forsake each compiler's streambuf implementation and use a Boost-provided general one that extends the standard's interface. With that Boost implementation building on some non-standard extension such as _wfopen, or implementing buffers from scratch, all the way. Which in the general case will be necessary, and thus, no point in also using extensions.
Now is that realistic?
Artyom's suggestion, if I understand (if not then assume it's my suggestion), is to use the native streambuf whenever possible, but use Artyom's implementation of streambuf on windows when compiling with non-Dinkumware standard library. It's definitely realistic.
Using short file names is no go for two reasons:
1. It works only when file exists (can't create new file) 2. It is quite deprecated
Both points are incorrect.
Regarding (1), did you notice that I wrote "full source code provided"...?
Yes, you provide a hackish workaround, that is vulnerable to a race-condition btw.
No, AFAIK there is no extra realistic vulnerability.
There is a vulnerability, yes, but it's there anyway.
And there is a non-realistic (extremely low probability) vulnerability that process A attempts to create a non-existing file F, while process B tries to delete F. In this case creating F might fail /also with the fix/. But arguing that the fix is "hackish" because it fails to fix such an esoteric case, is IMO not serious.
It's extremely low until someone finds a way to increase it... It's unfair for you as a library developer to introduce a security hole or a bug to the user's code if you're aware of it. For me the sequence CreateFile() -> CloseHandle() -> Process B -> fopen/GetShortPathName is suspicious. Process B can do anything, including renaming files, deleting, creating or calling SetFileShortName(). If you can't prove that it's secure, then it's not. I admit that I intentionally exaggerate this point, but such 'small' things make systems robust. Still you ignored Artyom's no. (2). You know that, they exist only for backward compatibility and can be disabled on newer versions of windows. I would say they're unofficially deprecated. Even a more serious problem is that it doesn't work with
paths longer than MAX_PATH.
The vague impression you have in that direction is in a sense valid, that there is something fishy there, because with the fix one may conceivably have a problem that a path that is longer than MAX_PATH works, when by rights it should not (I have not tested this though).
However,
1 It is trivial to restrict narrow character path lengths to MAX_PATH so that that problem does not occur.
I prefer a solution that handles them rather than restricts my software.
2 It is only where the existing implementation fails to access a too long path, that the fix may fail to erroneously fix it.
What?
3 It is not a good idea to use Windows paths longer than MAX_PATH since Windows Explorer and many other tools do not support them. Unless you want your application to create unusable files.
If someone doesn't support them, it's not an excuse for limiting the software that needs to. On the contrary, add the support to the low-level libraries so that the software that builds on those libraries will transparently benefit from the upgrade.
In short your "serious problem" has it backwards as to what the problems are, and the fix does not introduce any restriction.
I think when something demonstrably works, it's rather dumb to assert that
it can't work, sorry.
But considering that Boost.Filesystem is slated for later inclusion in the
C++
standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++,
This is a problem that can be fixed easily in Boost.Filesystem.
Yes, I think so, for Windows.
However, more important, the existence of an actual fix (which I linked to) shows that the present interface does not prevent a fix for Windows.
For some other OS-es it may not necessarily be easy to fix, however, and thus, for inclusion of Boost.Filesystem in the standard library I think that should be thoroughly investigated.
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world. Even if you fix the Unicode problems, you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS. Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
Hm, I haven't used the path class.
However the MAX_PATH issue is not really an issue in practice.
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
This offends me. I'm facing the MAX_PATH limitation at my work. If you develop nothing more than desktop apps that interact directly with the user, then please don't infer from this that others don't need it either. MAX_PATH is a problem in a large scale systems where you have enormous amounts of data. In particular this is why we don't switch to boost::filesystem (since it neither workarounds the problem nor works well when given a long path with \\?\). -- Yakov

On 26.10.2011 14:17, Yakov Galka wrote: [snip argumentative noise]
On Wed, Oct 26, 2011 at 12:59, Alf P. Steinbach<
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
This offends me. I'm facing the MAX_PATH limitation at my work. If you develop nothing more than desktop apps that interact directly with the user, then please don't infer from this that others don't need it either. MAX_PATH is a problem in a large scale systems where you have enormous amounts of data.
You don't need such long paths for any size of data.
In particular this is why we don't switch to boost::filesystem (since it neither workarounds the problem nor works well when given a long path with \\?\).
Those long paths can't be handled by Windows Explorer or most other software. Therefore, adding an ability to use them inadvertently, is at best strongly misguided. It would encourage novice developers to create files that ordinary users can't delete, move or rename. The world has 26 years of Windows usage without such long paths, even after they were introduced in 1993. So with 26 years of not needing them going on strong, and 18 years of not being used (by anyone other than script kiddies) despite being there, it is an established fact any competent developer don't need them and won't use them. And also, it is an established fact that using them creates trouble for the users. Don't even think about it -- and here I'm not talking about Boost, but about your own efforts. Cheers & hth., - Alf

Alf P. Steinbach wrote:
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And
These are other uses I have seen: 1. Malware authors giving malware long names 2. Long path created within VS database projects as a consequence of large index or key names 3. Long paths created as a consequence of moving a folder structure to a new destination with a larger path 4. Malware collections with sha1 hashes as filenames organized in folders with partial hashes as folder names
that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
I believe the situation is quite the contrary. As a serious software developer you should be aware that long paths are possible and make your software compatible with them. On the worst case make it fail gracefully. I agree that you should try to avoid _creating_ long paths, precisely because other software developers chose not to support them in their applications.
You don't need such long paths for any size of data.
I don't agree with this. Of course, that depends on your definitions of "need". But there are times where long paths are very convenient.
Those long paths can't be handled by Windows Explorer or most other software. Therefore, adding an ability to use them inadvertently, is at
The ability is not added, it is already there. You can create long path with Explorer. For instance, share a folder with 200 character path, map it to a network disk and copy to the mapped device a folder structure with paths larger than 60 characters. Other possibility is moving (rather than copying) folder structures within another large folder. It is _your_ software that needs to handle these cases gracefully. The Explorer problems must not prevent you to solve similar problems in your software.
best strongly misguided. It would encourage novice developers to create files that ordinary users can't delete, move or rename.
I agree with this, but it applies only to _creation_. You should be able to handle long paths created by others.
The world has 26 years of Windows usage without such long paths, even after they were introduced in 1993. So with 26 years of not needing them going on strong, and 18 years of not being used (by anyone other than script kiddies) despite being there, it is an established fact any competent developer don't need them and won't use them. And also, it is an established fact that using them
I believe this is wrong, examples were provided above.
creates trouble for the users. Don't even think about it -- and here I'm not talking about Boost, but about your own efforts.
I believe it would be nice if boost::filesystem supported long paths in Windows. In fact, to a great extent it already does. What I think would be needed is tests with filenames starting with \\?. Best regards Jorge

on Wed Oct 26 2011, "Alf P. Steinbach" <alf.p.steinbach+usenet-AT-gmail.com> wrote:
Those long paths can't be handled by Windows Explorer or most other software. Therefore, adding an ability to use them inadvertently, is at best strongly misguided. It would encourage novice developers to create files that ordinary users can't delete, move or rename.
The world has 26 years of Windows usage without such long paths, even after they were introduced in 1993.
So with 26 years of not needing them going on strong, and 18 years of not being used (by anyone other than script kiddies) despite being there, it is an established fact any competent developer don't need them and won't use them. And also, it is an established fact that using them creates trouble for the users. Don't even think about it -- and here I'm not talking about Boost, but about your own efforts.
Suppose you want to write a utility to clean up such paths in the filesystem? If they can't be represented, there's a problem, neh? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 26.10.2011 16:57, Dave Abrahams wrote:
on Wed Oct 26 2011, "Alf P. Steinbach"<alf.p.steinbach+usenet-AT-gmail.com> wrote:
Those long paths can't be handled by Windows Explorer or most other software. Therefore, adding an ability to use them inadvertently, is at best strongly misguided. It would encourage novice developers to create files that ordinary users can't delete, move or rename.
The world has 26 years of Windows usage without such long paths, even after they were introduced in 1993.
So with 26 years of not needing them going on strong, and 18 years of not being used (by anyone other than script kiddies) despite being there, it is an established fact any competent developer don't need them and won't use them. And also, it is an established fact that using them creates trouble for the users. Don't even think about it -- and here I'm not talking about Boost, but about your own efforts.
Suppose you want to write a utility to clean up such paths in the filesystem? If they can't be represented, there's a problem, neh?
No problem: it would be a Windows-specific program, and also a rather small one, so use of the Windows API would be OK. I think what you're saying is that ideally Boost should support the full functionality of the native system. But Boost isn't doing that. And when the full functionality is not generally supported even by the vendor (Microsoft), it is a bit dangerous to support it. Because other software will not be able to handle it, and that includes e.g. the standard library of g++. ... Also, when I have cleaned up such files and folders in Windows, I have just used the command interpreter. E.g., "con" is a reserved device name (the console) and cannot usually be used to name a file or folder. But \\.\ or \\?\, I can't remember which, is like a "raw path". So, d:\> md dave d:\> cd dave d:\dave> md \\.\d:\dave\con d:\dave> dir Volume in drive D is DATA Volume Serial Number is A875-F9FD Directory of d:\dave 26.10.2011 17:24 <DIR> . 26.10.2011 17:24 <DIR> .. 26.10.2011 17:24 <DIR> con 0 File(s) 0 bytes 3 Dir(s) 233 918 234 624 bytes free d:\dave> rd con The system cannot find the file specified. d:\dave> rd \\.\d:\dave\con d:\dave> _ Of course I shouldn't really have shown such trick here, where I have impression that perhaps script kiddies lurking, but hey, it's my birthday. :-) Anyway, other tricks to handle troublesome paths include conversion to shortname form (the fix that I'm proposing to let Boost.Filesystem work with g++), which you can easily do using 'for' command, and using e.g. 'subst' command to define a logical drive for a directory in the path. Cheers, - Alf

On Wed, Oct 26, 2011 at 15:05, Alf P. Steinbach < alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 14:17, Yakov Galka wrote: [snip argumentative noise]
On Wed, Oct 26, 2011 at 12:59, Alf P. Steinbach<
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
This offends me. I'm facing the MAX_PATH limitation at my work. If you develop nothing more than desktop apps that interact directly with the user, then please don't infer from this that others don't need it either. MAX_PATH is a problem in a large scale systems where you have enormous amounts of data.
You don't need such long paths for any size of data.
So what are the alternatives? Let's see...: 1) Hack the software to use shorter non-descriptive names. 2) Maintain a database of long paths -> short unique ids 3) Buy some commercial database system... All of these increase software complexity on the high-level, and aren't applicable if some of the software is out of my control. Furthermore, why should I do these if the native filesystem suits my needs just fine? And in the context of boost.filesystem, if I write a cross-platform app that uses long paths on linux, I expect the filesystem library to handle them on windows correctly. On Wed, Oct 26, 2011 at 17:18, Alf P. Steinbach < alf.p.steinbach+usenet@gmail.com> wrote:
d:\> cscript /nologo x.js long name is 167 characters. short name is 94 characters.
Your script must be wrong, it didn't count the backslashes. It's 178 characters. On Wed, Oct 26, 2011 at 17:30, Alf P. Steinbach < alf.p.steinbach+usenet@gmail.com> wrote:
E.g., "con" is a reserved device name (the console) and cannot usually be used to name a file or folder. But \\.\ or \\?\, I can't remember which, is like a "raw path". So,
It's \\?\.
Of course I shouldn't really have shown such trick here, where I have impression that perhaps script kiddies lurking, but hey, it's my birthday. :-)
Happy birthday then :). -- Yakov

On Oct 26, 2011, at 6:05 AM, Alf P. Steinbach wrote:
On 26.10.2011 14:17, Yakov Galka wrote: [snip argumentative noise]
This offends me. I'm facing the MAX_PATH limitation at my work. If you develop nothing more than desktop apps that interact directly with the user, then please don't infer from this that others don't need it either. MAX_PATH is a problem in a large scale systems where you have enormous amounts of data.
You don't need such long paths for any size of data.
Technically, you don't need Windows either. Josh

On Wed, Oct 26, 2011 at 12:59 PM, Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 12:24, Yakov Galka wrote:
[...] you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS. Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
[...] However the MAX_PATH issue is not really an issue in practice.
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
It's a big world out there Alf, and paths longer than MAX_PATH are routine in at least one commercial here I know with Unix / Linux roots that's been working fine for years, and that's creating issues for its Windows port ATM, since indeed many Windows APIs are MAX_PATH limited (despite the weird filename handling to support MAX_PATH), and many scripting engines (e.g. TCL) are thus limited (those are used as utilities to manage the app's file-based "databases"). I'm not saying Boost.Filesystem *must* transparently support long paths, although that'd be terrific, but I'm pointing out that your assumption that long paths are only used by "silly" teenagers is a bit narrow minded. --DD

On 26.10.2011 15:27, Dominique Devienne wrote:
On Wed, Oct 26, 2011 at 12:59 PM, Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 12:24, Yakov Galka wrote:
[...] you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS. Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
[...] However the MAX_PATH issue is not really an issue in practice.
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
It's a big world out there Alf, and paths longer than MAX_PATH are routine in at least one commercial here I know with Unix / Linux roots that's been working fine for years, and that's creating issues for its Windows port ATM, since indeed many Windows APIs are MAX_PATH limited (despite the weird filename handling to support MAX_PATH), and many scripting engines (e.g. TCL) are thus limited (those are used as utilities to manage the app's file-based "databases"). I'm not saying Boost.Filesystem *must* transparently support long paths, although that'd be terrific, but I'm pointing out that your assumption that long paths are only used by "silly" teenagers is a bit narrow minded.
In Windows. Don't lose that context. In a system that fully supports feature X, feature X can be reasonably used even if it's of no particular benefit. In a system where use of feature X creates trouble, and where 26 years of worldwide experience says that it's not necessary, it's not a good idea to encourage its use. Cheers & hth., - Alf

On Wed, Oct 26, 2011 at 3:47 PM, Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com> wrote:
On 26.10.2011 15:27, Dominique Devienne wrote: In a system where use of feature X creates trouble, and where 26 years of worldwide experience says that it's not necessary, it's not a good idea to encourage its use.
The fact remains that NTFS does support such paths (and it's not 26 years old), and there are (convoluted) ways to support them on Windows (by some APIs at least) and some apps *do* need that support (on Winsows, yes). Whether Windows Explorer can see those files are not doesn't matter, as long as the app is functional. And if Boost.Filesystem supported those long paths on Windows, all the better IMHO. I don't see supporting a given feature as encouraging its use either. All I'm saying is that a general purpose API shouldn't put arbitrary limitations when the underlying plaftorm(s) supports a feature, for "philosophical" reasons. Just let the platform error-out when a given limit is exceeded. --DD

26.10.2011 14:59, Alf P. Steinbach пишет:
However the MAX_PATH issue is not really an issue in practice.
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
Please note that the following filename is very close to the limit: bin.v2\libs\program_options\build\msvc-10.0\release\address-model-64\architecture-x86\link-static\runtime-link-static\threading-multi\libboost_program_options-vc100-mt-s-1_47.lib I even had real problems with building Boost from the not-so-long subfolder of the my home folder. -- Sergey Cheban

On 26.10.2011 16:19, Sergey Cheban wrote:
26.10.2011 14:59, Alf P. Steinbach пишет:
However the MAX_PATH issue is not really an issue in practice.
The *only* uses I have seen of paths longer than MAX_PATH, have been silly script-kiddies trying to create problems for people running FTP servers. And that's because the ordinary tools can't handle them, thus, difficult to remove the script-kiddie's nested folders. Conversely, as a serious software developer one should stay well away from such paths.
Please note that the following filename is very close to the limit: bin.v2\libs\program_options\build\msvc-10.0\release\address-model-64\architecture-x86\link-static\runtime-link-static\threading-multi\libboost_program_options-vc100-mt-s-1_47.lib
d:\> set LONGNAME=bin.v2\libs\program_options\build\msvc-10.0\release\address-model-64\architecture-x86\link-static\runt ime-link-static\threading-multi\libboost_program_options-vc100-mt-s-1_47.lib d:\> for %f in (%LONGNAME%) do @set SHORTNAME=%~sf & echo %~sf d:\bin.v2\libs\PROGRA~1\build\msvc-10.0\release\ADDRES~1\ARCHIT~1\LINK-S~1\RUNTIM~1\THREAD~1\LIBBOO~1.LIB d:\> echo function print(s) { WScript.StdOut.WriteLine( s ); } >x.js d:\> echo print( "long name is " + "%LONGNAME%".length + " characters." ) >>x.js d:\> echo print( "short name is " + "%SHORTNAME%".length + " characters." ) >>x.js d:\> cscript /nologo x.js long name is 167 characters. short name is 94 characters. d:\> _ It's about 2/3 of the way (as I recall MAX_PATH is 260 or thereabouts). The short name is about 1/3 of the way.
I even had real problems with building Boost from the not-so-long subfolder of the my home folder.
Yes, evidently those who thought it was a good idea to encode file properties as nested folders, were used to a system supporting arbitrarily long paths as well as symlinks. I.e. *nix. While Windows, in spite of early Posix 0.9 support, didn't support symlinks until the latest version, Windows 7. Perhaps, in addition to adding g++ support in Boost.Filesystem, it might be a good idea if Someone(TM) told whomever is responsible for that enthusiastic folder nesting, that some other scheme might fit better in the non-*nix world? Or even in *nix... ;-) <g> Cheers & hth., - Alf

On 26/10/2011 12:24, Yakov Galka wrote:
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world. Even if you fix the Unicode problems, you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS. Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
I suggest you forward your comments to the LWG or whoever is in charge of evaluating things for TR2.

On Wed, Oct 26, 2011 at 6:24 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world.
Could you explain that a bit further? Since class path is used all the time for paths outside the Unix world, I'm curious to know what your concerns are.
Even if you fix the Unicode problems,
What Unicode problems are you running into? Although there are some locale related tickets outstanding, I'm not aware of any Unicode issues.
you still cannot use long paths on windows (longer than MAX_PATH), although they are supported by the OS.
There is one ticket outstanding, http://svn.boost.org/trac/boost/ticket/5448, that is somewhat related to PAX_PATH limitations. The objective is to support any path that is acceptable to the operating system, and that includes better support for long paths.
Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
Which fixes are bothering you:-? --Beman

On Wed, Oct 26, 2011 at 22:47, Beman Dawes <bdawes@acm.org> wrote:
On Wed, Oct 26, 2011 at 6:24 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world.
Could you explain that a bit further? Since class path is used all the time for paths outside the Unix world, I'm curious to know what your concerns are.
Give me some time to write a constructive criticism. Some issues I raise below.
you still cannot use long paths
on windows (longer than MAX_PATH), although they are supported by the OS.
There is one ticket outstanding, http://svn.boost.org/trac/boost/ticket/5448, that is somewhat related to PAX_PATH limitations. The objective is to support any path that is acceptable to the operating system, and that includes better support for long paths.
I know this ticket. Your objective is useless. Seems Bjarne is right about the composition of the committee. Have you done a survey among the users of the library? What I do expect is that calling path.native() will return a string ready to be passed to CreateFileW. No worries about long paths, slashes or backslash, relative paths or other per-platform quirks. Currently I must write something like this: path p = get_some_path(); p = system_complete(p); // according to msdn not guaranteed to work for long paths. Fortunately it does in practice. std::wstring q = p.native(); if(starts_with(q, L"\\\\")) q = "\\\\?\\UNC\\" + q.substr(2); else q = "\\\\?\\" + q; // doesn't handle \\.\.... CreateFileW(q.c_str(), ...); The problem is more serious when we observe that the \\?\ and \\.\ syntax is a detail of implementation. For example we DON'T want the user to be able to input a \\.\ path. We also prefer the user to don't see \\?\ at all. In fact there are two path syntaxes on windows: 1) User paths: can have slashes and backslashes. They start with "\\x", "\x", "x:" or "x" (more or less). 2) System paths: only backslashes supported in general, can't be relative, may start with "\\?\" and "\\.\".
Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
Which fixes are bothering you:-?
I was talking about revision 71157. -- Yakov

On Thu, Oct 27, 2011 at 3:04 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Oct 26, 2011 at 22:47, Beman Dawes <bdawes@acm.org> wrote:
On Wed, Oct 26, 2011 at 6:24 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world.
Could you explain that a bit further? Since class path is used all the time for paths outside the Unix world, I'm curious to know what your concerns are.
Give me some time to write a constructive criticism. Some issues I raise below.
> you still cannot use long paths
on windows (longer than MAX_PATH), although they are supported by the OS.
There is one ticket outstanding, http://svn.boost.org/trac/boost/ticket/5448, that is somewhat related to PAX_PATH limitations. The objective is to support any path that is acceptable to the operating system, and that includes better support for long paths.
I know this ticket. Your objective is useless. Seems Bjarne is right about the composition of the committee. Have you done a survey among the users of the library?
I hear from users of the library regularly. They mirror the rest of the Boost population; very roughly one third use Unix-like systems and two thirds use Windows. Once Microsoft ships the Dinkumware implementation of the library (based on V2) with VC++ 211, the percentage of Windows users will presumably increase.
What I do expect is that calling path.native() will return a string ready to be passed to CreateFileW.
Yes, that's what it does. Regardless of the operating system, the contents of path.native(), or more to the point, path.c_str(), is exactly as was passed into the path originally. That's important in case it is one of the implementation defined strings - it isn't the job of path to adjust the string.
No worries about long paths, slashes or backslash, relative paths or other per-platform quirks.
Right.
Currently I must write something like this:
path p = get_some_path(); p = system_complete(p); // according to msdn not guaranteed to work for long paths. Fortunately it does in practice. std::wstring q = p.native(); if(starts_with(q, L"\\\\")) q = "\\\\?\\UNC\\" + q.substr(2); else q = "\\\\?\\" + q; // doesn't handle \\.\.... CreateFileW(q.c_str(), ...);
If you have to write any of that, it is a bug in the library implementation. That's the point of http://svn.boost.org/trac/boost/ticket/5448 - to be sure that the odd-ball, implementation defined syntaxes work. Not just for Windows, but for POSIX too.
The problem is more serious when we observe that the \\?\ and \\.\ syntax is a detail of implementation. For example we DON'T want the user to be able to input a \\.\ path. We also prefer the user to don't see \\?\ at all.
Who is the "we" here? There was a time very early on in V1 where I though the Filesystem library path should only accept "approved" forms. Users, and a lot of them at that, let me know loud and clear that they didn't want to be nannied. If they passed in a given native string, that's want they wanted to get passed to the operating system. If they passed in a given generic string, that's want they wanted to get passed to the operating system, modulo only any changes absolutely required by the O/S (which means none on either POSIX or Windows).
In fact there are two path syntaxes on windows: 1) User paths: can have slashes and backslashes. They start with "\\x", "\x", "x:" or "x" (more or less). 2) System paths: only backslashes supported in general, can't be relative, may start with "\\?\" and "\\.\".
Yes, and there is no intention for the Filesystem library to intervene if the user gets it wrong, such as by exceeding a maximum length mandated by the operating system or file system.
Moreover, judging by the last fixes to the library, it looks like Beman wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
Which fixes are bothering you:-?
I was talking about revision 71157.
Ah! So you want to hide the various implementation defined formats? --Beman

On Thu, Oct 27, 2011 at 22:26, Beman Dawes <bdawes@acm.org> wrote:
On Thu, Oct 27, 2011 at 3:04 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Oct 26, 2011 at 22:47, Beman Dawes <bdawes@acm.org> wrote:
On Wed, Oct 26, 2011 at 6:24 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
Personally I think that boost::filesystem::paths are a sad joke, it's a pity they're heading to the standard. Although the OS-part is definitely good, the way path class is design isn't suitable for paths outside the unix world.
Could you explain that a bit further? Since class path is used all the time for paths outside the Unix world, I'm curious to know what your concerns are.
Give me some time to write a constructive criticism. Some issues I raise below.
you still cannot use long paths
on windows (longer than MAX_PATH), although they are supported by the OS.
There is one ticket outstanding, http://svn.boost.org/trac/boost/ticket/5448, that is somewhat related to PAX_PATH limitations. The objective is to support any path that is acceptable to the operating system, and that includes better support for long paths.
I know this ticket. Your objective is useless. Seems Bjarne is right about the composition of the committee. Have you done a survey among the users of the library?
I hear from users of the library regularly. They mirror the rest of the Boost population; very roughly one third use Unix-like systems and two thirds use Windows. Once Microsoft ships the Dinkumware implementation of the library (based on V2) with VC++ 211, the percentage of Windows users will presumably increase.
... and one tenth of them write portable code,1/200th aware of MAX_PATH limitation, and 1/4000th really care about it. Btw, V2 was somewhat better in unicode handling, since theoretically I could use the narrow strings for UTF-8 (see [1] why not). Given the current specification of the library it seems impossible that microsoft will workaround the MAX_PATH limitation.
What I do expect is that calling path.native() will return a string ready to be passed to CreateFileW.
Yes, that's what it does.
Excuse me, but it's not: path p = "C:\\"; // assume this is a root of some project path q; for(int i = 0; i < 100; i++) q /= "a\\.."; q /= "text.txt"; // and this is a long relative path for a file within the project CreateFileW((p/q).c_str(), ...); // fails.... Does path.native() always return a string that can be passed to CreateFileW? NO!
Regardless of the operating system, the contents of path.native(), or more to the point, path.c_str(), is exactly as was passed into the path originally.
We don't need this requirement. Just drop it. Even if you want the path class to store the original string, it doesn't rule out returning an adjusted copy for native().
That's important in case it is one of the implementation defined strings
I don't understand. Can you give an example please?
- it isn't the job of path to adjust the string.
Wrong. If taking this as an axiom we get: 1) path is a "smart" string that converts "transparently" between narrow and wide chars. 2) path is a "smart" string that adds '/' or '\\' on concatenation (when needed). 3) path has a set of convenient observers (iterate through components, extension(), stem() blah blah). First, observe that 1 is orthogonal to 2 and 3. Furthermore, let's compare them with what we have now (in world without boost::path): 1) Isn't fundamentally bad, but we can argue against this just as against any string class that converts between different encodings transparently. Anyway it's not the job of the path class. In any case standardizing on UTF-8 eliminates the need of any conversions in the interface of boost::path (see [1] for details), it's the job of native() to return the native string from the given path. Currently I can use: std::string myPath = get_utf_8_path(); CreateFileW(native(myPath), ...); // as simple as with boost::path 2) Nice. But this feature is not the one I would decide whether I want to use the lib or not. Solved easily by following a conventions that all directories end with '/' or '\\': std::string myPath = get_user_home_dir() + "Documents\\"; CreateFileW(native(myPath + "a.txt"), ...); // simpler than in boost::path: no need to worry about MAX_PATH 'overflow', native takes care about this. 3) Good, this is the no. 2 reason people use the library (no. 1 is the "Operational Functions" which can be used without boost::path class in user-code).
No worries about long paths, slashes or backslash, relative paths or other per-platform quirks.
Right.
What's right? boost::path doesn't workaround this at all! Each platform quirk bubbles to the interface.
Currently
I must write something like this:
path p = get_some_path(); p = system_complete(p); // according to msdn not guaranteed to work for long paths. Fortunately it does in practice. std::wstring q = p.native(); if(starts_with(q, L"\\\\")) q = "\\\\?\\UNC\\" + q.substr(2); else q = "\\\\?\\" + q; // doesn't handle \\.\.... CreateFileW(q.c_str(), ...);
If you have to write any of that, it is a bug in the library implementation. That's the point of http://svn.boost.org/trac/boost/ticket/5448 - to be sure that the odd-ball, implementation defined syntaxes work. Not just for Windows, but for POSIX too.
Please, I sincerely entreat you to write an example, how you envision this code will look like given the following constraints: =================== path p = get_some_path_1(); path q = get_some_path_2(); path r = p/q; // MAGIC #ifdef WINDOWS CreatefileW(r.c_str(), ...); #else open(r.c_str(), ...); #endif 1) get_some_path_1/2() might be arbitrary valid paths, possibly read from different places in the configuration/user input, therefore although each of them is valid (< MAX_PATH) their concatenation might exceed MAX_PATH. 2) The code creates a file at this path using CreateFileW on windows and e.g. open() on Linux. 3) It there is a way to create the file at the specified path, it must do it. 4) It's portable: no #ifdefs or whatever except for the CreateFile/open() part. ===================
The problem is more serious when we observe that the \\?\ and \\.\ syntax is
a detail of implementation. For example we DON'T want the user to be able to input a \\.\ path. We also prefer the user to don't see \\?\ at all.
Who is the "we" here? There was a time very early on in V1 where I though the Filesystem library path should only accept "approved" forms. Users, and a lot of them at that, let me know loud and clear that they didn't want to be nannied.
"Users" here were the end-users who use the software that builds on top of boost::paths. And any user of the app who's not a programmer, even if she has technical background, should not see \\?\ as output or need to prepend \\?\ in the input to use a long path. If they passed in a given native
string, that's want they wanted to get passed to the operating system. If they passed in a given generic string, that's want they wanted to get passed to the operating system, modulo only any changes absolutely required by the O/S (which means none on either POSIX or Windows).
Again this is a false assumption. You just refuse to accept that sometimes prepending \\?\ and converting '/' to '\\' is absolutely needed in some cases. I expect that your answer to the above small 'exercise' will be a proof for why I'm wrong.
In fact
there are two path syntaxes on windows: 1) User paths: can have slashes and backslashes. They start with "\\x", "\x", "x:" or "x" (more or less). 2) System paths: only backslashes supported in general, can't be relative, may start with "\\?\" and "\\.\".
Yes, and there is no intention for the Filesystem library to intervene if the user gets it wrong, such as by exceeding a maximum length mandated by the operating system or file system.
The user in this context is the end-user. And you're again shifting the burden on the application writer.
Moreover, judging by the last fixes to the library, it looks like Beman
wants to shift the burden of this on the user of the library, instead of implementing something that works transparently.
Which fixes are bothering you:-?
I was talking about revision 71157.
Ah! So you want to hide the various implementation defined formats?
Not exactly. I think that a design based on various path parsers and formatters is more appropriate. native() will always return a system syntax. Other functions will return the user-syntax. What syntax is used by the path class internally is a detail of implementation. The path is constructed by default from user-syntax. It can be constructed from system syntax by giving the system parser to the constructor. Such approach will make it possible to even work with windows paths on linux, or extend the concepts to work with non-fs paths. [1] -- http://permalink.gmane.org/gmane.comp.lib.boost.devel/225036 Sincerely, -- Yakov

On Wed, Oct 26, 2011 at 5:47 AM, Artyom Beilis <artyomtnk@yahoo.com> wrote:
----- Original Message -----
From: Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com>
First, apologies if I'm posting to the wrong group/list; if so then please redirect me.
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
Why just not to implement boost::filesystem::fstream over _wfopen and custom streambuf implementation?
It is relatively simple.
It always seemed to me that a custom streambuf implementation would be hard to support in that users would expect it to exactly mimic the native library's implementation, bugs and all.
Using short file names is no go for two reasons:
1. It works only when file exists (can't create new file)
The Filesystem V2 hack used the Windows wide API for file create, so the fstream implementation only ever saw an existing file.
2. It is quite deprecated
8.3 support can be turned off, so is problematic. A patch to libstdc++ would be much cleaner, and would benefit users beyond just the Boost community, IMO. --Beman

On Tue, Oct 25, 2011 at 9:41 AM, Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com> wrote:
First, apologies if I'm posting to the wrong group/list; if so then please redirect me.
This is the best place to post, although opening a ticket is best for problems that aren't trivial to fix.
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
Yes, although it is actually characters that are not covered by the current file system codepage rather than non-ANSI characters, IIRC. Surprisingly, no one has opened a ticket yet. Until someone does open a ticket and the problem gets fixed, there are a couple of workarounds: (1) Use V2. Its fstream.hpp uses an implementation hack that works as long as 8.3 filenames are enabled. (Some Windows users disable 8.3 filenames as an optimization.) (2) V3 may work OK with the Microsoft 65001 UTF-8 codepage, although I've never used it myself and you would have to pass in a UTF-8 encoded narrow character name.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
Right. libstdc++ doesn't provide the wchar_t overloads.
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
I wrote up a technical solution in section 5, starting on page 16, of that work-in-progress document, available on Google Docs at:
Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code.
I do not know what to do with this.
If you care enough to open a ticket on the Boost bug tracker, I'll move the V2 code to V3. But there is a big backlog of tickets, so no guarantees as to when that will happen. Another possibility is to try to talk the libstdc++ folks into supporting the Dinkumware wchar_t extension. They will presumably want to do that anyhow to support TR2 (or whatever it is going to be called).
But considering that Boost.Filesystem is slated for later inclusion in the C++ standard library (or at least into TR2), I think it would be nice if it is able to give access to all accessible files in Windows, also with g++, so that we don't end up with a file handling part of the standard library that can't handle files in general; hence this posting and plea for advice -- what more should I do, if anything?
The libstdc++ folks are strong supporters of the standard library and its TRs, so I suspect they will faithfully implement whatever comes out of the standardization process. Thanks, --Beman

On 26.10.2011 22:13, Beman Dawes wrote:
On Tue, Oct 25, 2011 at 9:41 AM, Alf P. Steinbach <alf.p.steinbach+usenet@gmail.com> wrote:
IMHO access to files is a crucial part of Boost.Filesystem. However, with Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream etc. fail to open or create files with non-ANSI characters. It works fine with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the Code::Blocks IDE.
Yes, although it is actually characters that are not covered by the current file system codepage rather than non-ANSI characters, IIRC. Surprisingly, no one has opened a ticket yet.
Until someone does open a ticket and the problem gets fixed, there are a couple of workarounds:
(1) Use V2. Its fstream.hpp uses an implementation hack that works as long as 8.3 filenames are enabled.
I think this is good. :-) It's what I, unaware of the history, proposed.
(Some Windows users disable 8.3 filenames as an optimization.)
The capability to disable them is there, but I don't think anyone is actually doing that. Because: Windows uses 8.3 filenames in the registry, and reportedly the Microsoft Installer uses and requires them, and so on.
(2) V3 may work OK with the Microsoft 65001 UTF-8 codepage, although I've never used it myself and you would have to pass in a UTF-8 encoded narrow character name.
I'm not sure exactly what you're thinking of here, but I suspect that it's due to some technical misunderstanding. Narrow character Windows paths need to be encoded as ANSI, which is not a specific codepage but the variation of codepage 1252 specified by the GetACP function. This codepage is independent of the active codepage in a console; the default codepage for a console is called the "OEM" codepage. Changing the ANSI or OEM codepage, the default codepages, can be done via an undocumented registry key, and rebooting. However, while I regularly recommend changing the OEM codepage (from 437 to e.g. 1252), changing the ANSI codepage to something non-ANSI could conceivably wreak a lot of havoc with applications that assume that the ANSI codepage is like ANSI, a single byte per char encoding.
The failure probably has nothing to do with the g++ version: it's due to g++ not offering the Visual C++ wchar_t oriented extensions to the standard iostreams (Boost.Filesystem uses these Visual C++ extensions).
Right. libstdc++ doesn't provide the wchar_t overloads.
I stumbled onto this while I was writing about using Unicode in C++ programming in Windows.
I wrote up a technical solution in section 5, starting on page 16, of that work-in-progress document, available on Google Docs at:
Essentially, the fix I ended up with, full source code given in the above doc, uses Windows short file names if (1) there is no wide character support and if furthermore (2) the filename can't be perfectly translated to ANSI. The C++ implementation's support for wide chars is automatically detected using C++98-compatible code.
I do not know what to do with this.
If you care enough to open a ticket on the Boost bug tracker, I'll move the V2 code to V3. But there is a big backlog of tickets, so no guarantees as to when that will happen.
Thank you, done. <url: https://svn.boost.org/trac/boost/ticket/6065>
Another possibility is to try to talk the libstdc++ folks into supporting the Dinkumware wchar_t extension. They will presumably want to do that anyhow to support TR2 (or whatever it is going to be called).
Luc Danton, over at SO, pointed me to some earlier discussion of extending libstdc++ with Unicode path support, in June this year, at <url: http://gcc.gnu.org/ml/libstdc++/2011-06/msg00066.html>. Maybe that can be useful? Cheers, - Alf
participants (10)
-
Alf P. Steinbach
-
Artyom Beilis
-
Beman Dawes
-
Dave Abrahams
-
Dominique Devienne
-
Jorge Lodos Vigil
-
Joshua Juran
-
Mathias Gaunard
-
Sergey Cheban
-
Yakov Galka