[filesystem] Generic path grammar corner cases

I'm working to align the Boost.Filesystem generic path grammar with the POSIX specification. Doing so clarifies the specification of the library, and ensures that all native paths for both POSIX and Windows will work correctly. (Windows follows the POSIX conventions in some of the corner cases involved.) The particular cases involve extra slashes in paths. For example, "/foo//bar//". I was surprised to find that such a path is well-defined for POSIX and Windows, with a meaning of "/foo/bar/." In other words, multiple slashes treated as a single slash, and paths with a trailing slash treated as if a period was appended. The rules for more than two leading slashes are a bit more complex - for POSIX they are treated as a single slash, for Windows that is an invalid path. Question: what should path("foo//bar//").string() yield? 1) "foo//bar//" 2) "foo/bar/" 3) "foo/bar/." (1) follows the rule that the path string is always exactly as input. Desirable in that if a platform actually implements something a bit different from the POSIX specs for multiple slashes, implementations will behave as expected for the platform. Downside is much more complex implementation (because many more functions have to be able to cope with multiple slashes) and more complex testing. (2) Desirable in that if a platform actually implements something a bit different from the POSIX (and Windows) specs for multiple slashes, implementations will behave portably and correctly in a POSIX (and Windows) sense. (3) Desirable in that for all three options directory iteration will return three elements - "foo", "bar", "." - so it is a bit counter intuitive for the "." not to appear in the string() results. Not a strong argument. I'm leaning toward (2). Any comments? --Beman

Since the slash or slashes in that case are extraneous, I would also say option 2 is the way to go. Remove the trailing slash because it was probably not intended in the first place. The only question I would have is about the path that is simply "//" or natively in the windows world "\\". In the windows command prompt changing to those two directories yields the following results. C:\Documents and Settings\blah>cd \\ '\\' CMD does not support UNC paths as current directories. and C:\Documents and Settings\blah>cd \ C:\> So if you were to trim the trailing slash in "//", it could potentially have different results in windows since it sees the "\\" as the beginning of a UNC path. Or I could just be looking to far into it, and this isn't a problem at all? - Dave "Beman Dawes" <bdawes@acm.org> wrote in message news:d7mvgn$jnu$1@sea.gmane.org...

"David Daeschler" <daveregs@rsaisp.com> wrote in message news:d7n1to$rr7$1@sea.gmane.org...
I don't believe removing a trailing slash is a good idea because that isn't how POSIX or Windows treats a trailing slash, and also because some apps may depend on it to distinguish between a directory path and a file path.
Leading double slashes have special meaning in both POSIX and Windows. Oddly, POSIX treats three or more leading slashes as a single slash. Three or more is just plain illegal in Windows. I didn't mention that in the original message to keep it simple, but those cases will have to be handled specially. --Beman

On Thu, Jun 02, 2005 at 10:37:14AM -0400, Beman Dawes wrote:
e.g a trailing slash changes semantics when dealing with symlinks: $ mkdir aaa $ touch aaa/monkey $ ln -s aaa bbb $ ls -l bbb lrwxrwxrwx 1 redi redi 3 Jun 2 18:40 bbb -> aaa $ ls -l bbb/ -rw-rw-r-- 1 redi redi 0 Jun 2 18:40 monkey In answer to your first question, I think I prefer (2) too. Regarding the escape sequence, do you want to play nicely with null-terminated char strings? If not, '\0' would be my choice, since that and '/' are the only characters POSIX disallows in a filename. But there are probably plenty of ways that would cause otherwise valid programs to fail. Then again, I'm not sure it's necessary to support slashes in filenames at all. jon

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050602175118.GA34709@compsoc.man.ac.uk...
Interesting.
Yep, I thought of that too. But since we do want to be able to "play nicely with null-terminated char strings", I think it's a non-starter.
Then again, I'm not sure it's necessary to support slashes in filenames at all.
I don't think it is strictly necessary for the Boost implementation, either. But I want to propose Boost.Filesystem for standardization, and need to know what the spec would be for, say OpenVMS. And allowing them does cover the admittedly obscure case of someone who just has to create a path "foo//bar", which they would do via path("foo/\abar"). It is a bit of a kludge, but since the expected usage is very uncommon, I don't see that as a problem. The impact on both specification and implementation is trivial. Thanks, --Beman

Beman Dawes wrote: ...
I personally prefer 2, and already filter these beasties using the string_algo library when I get them from environment variables. Although, I've been told the multiple slashes are required by our legacy nut-cracker laden dlls that front-end a fairly old version of gnumake. I've no idea why they are needed though. Jeff Flinn

Beman Dawes wrote:
Yeah, interesting here's some examples(names changed to protect the not so innocent): xxx_DOS_HOME=g:\\release\\winnt\\advdev\\ xxx_yyy_DOS_ROOT=g:\\release\\winnt\\xxx\\yyy\\v7.0 xxx_yyy_aa_DOS_ROOT=g:\\release\\winnt\\xxx\\yyy\\v7.0 xxx_yyy_aa_ROOT=g://release//winnt//xxx////yyy//v7.0 xxx_yyy_ROOT=g://release//winnt//xxx////yyy//v7.0 xxx_yyy_bb_DOS_ROOT=g:\\release\\winnt\\xxx\\yyy\\v7.0 xxx_yyy_bb_ROOT=g://release//winnt//xxx////yyy//v7.0 and my favorite: xxx_cc_DOS_HOME=g:\\release\\winnt\\advdev\\//xxx//cc//bin
I'll try to find an answer this afternoon. Jeff Flinn

Jeff Flinn wrote:
...
The master of lore who may have that knowledge is out today, so I'll try tomorrow. The recollection of others here thought it was a NutCracker restriction in the way it processed env vars. I ran a few tests redefining some env vars to have single slashes or back-slashes without error. There were still some multi-slash items that I don't have control over, that still appeared in a generated make file(lib paths). So it could very well be that the multiples are no longer required. Jeff Flinn

Beman Dawes wrote:
I'm leaning toward (2). Any comments?
What happens if the user wants to start concatenating filenames on the end of the path? Ideally they will use library facilities of course <g> But if they insist on doing it the hard way, (2) 'just works', (3) requires additional parsing to remove the trailing '.' and (1) is still the oddball - for both good and ill. For me that makes it a call between (1) and (2), and I am in favour of (2) as I prefer to have a single, well defined portable representation - not something else I will need to parse again with yet another library. I think this means we are in agreement <g> AlisdairM

"Alisdair Meredith" <alisdair.meredith@uk.renaultf1.com> wrote in message news:d7n3g2$1h8$1@sea.gmane.org...
Good. I needed a reality check. I've got my nose so buried in the trees it would be easy to miss the forest. I guess the question comes down to this: In spite of POSIX specs and Windows experiments to the contrary, are there ever valid paths with multiple slashes (other than at the start)? That may be the same issue as what to do for an operating system where slash is valid and useful file or directory name. I think the example someone gave was a filename like "data as of 1/2/2005". A fix for both issues would be to introduce an escape sequence which means "slash and I really mean it", or an escape path prefix which means "don't modify this path in any way." Both of those seem like ugly warts, and I've been avoiding them until someone comes up with a compelling use case. I would hate to do something ugly that may have no practical use whatsoever. Hum... Gears clank in brain for awhile... In the past, whenever I thought about escape sequences, I assumed we would have to invent a new escape sequence, and that would be quite messy. But what about hijacking one of the existing escape sequences? Of all the C++ escape sequences, '\a' stands for BEL, and would seem to have the least probability of ever being a valid character in a path. It isn't valid at all for some OS's. If Boost.Filesystem hijacked it as an escape sequence meaning ""slash and I really mean it", we could use option (2) with all its advantages, yet if someone desperately needs more slashes for whatever reason, they have a way of doing so. Thoughts? --Beman

Beman Dawes wrote:
I belive it used to be possible to use a Mac to create a file with '/' in the name on a Unix filesystem via NFS but this could be a real corner case (in side a padded room perhaps!) Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

From: "Beman Dawes" <bdawes@acm.org>
I think the correct answer is (1). The reason is that is what the client gave you. Why modify it? It may be surprising to get something different back. It is reasonable to provide a normalization function that can make any desired tweak to a path, thus externalizing and making explicit the normalization. This also enables creating several normalization functions rather than hard coding one in path. OTOH, (path("foo/") / "/bar").string() should yield "foo/bar" since this is a case of the user asking path to do the concatenation. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;
participants (7)
-
Alisdair Meredith
-
Beman Dawes
-
David Daeschler
-
Jeff Flinn
-
Jonathan Wakely
-
Kevin Wheatley
-
Rob Stewart