[filesystem] status() if neither file nor directory?

In http://www.esva.net/~beman/filesystem_operations_predicates.htm, the assumptions was that file system entities are either files or directories. But an operating system might support additional file system entities which can only be accessed via a specific API, and can't be accessed via fstreams or directory_iterators. For example, on some operating systems, sockets are file system entities and may be found during directory iterations, but can't be opened as either directories or fstreams. What to do? Possibilities: 1) Add an other_flag and a matching is_other() function. Change definition of exists() to status() & (directory_flag|file_flag|other_flag) 2) Try to identify specific other types like sockets, and define flags and predicate functions for each. Handle future needs by saying an implementation has to define suitable flags and predicate functions for any additional kinds of files. 3) Treat such entities as files. 4) Treat such entities as errors. 5) Treat such entities as not found. To me, (1) seems best. (2) is too open ended, and without a matching API to access the additional types, doesn't do much beyond (1). The others are not acceptable because they are misleading and don't match expectations. Comments? --Beman

Beman Dawes wrote:
1) Add an other_flag and a matching is_other() function. Change definition of exists() to status() & (directory_flag|file_flag|other_flag)
An entity that is not file or directory should just return false from is_file and is_directory. There are no operations defined on "other", so the category and the corresponding predicate are useless. It seems to me that the correct definition of exists() in terms of status() is just !(status() & not_found_flag).

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:012701c553ea$c2707610$6401a8c0@pdimov2...
Beman Dawes wrote:
1) Add an other_flag and a matching is_other() function. Change definition of exists() to status() & (directory_flag|file_flag|other_flag)
An entity that is not file or directory should just return false from is_file and is_directory. There are no operations defined on "other", so the category and the corresponding predicate are useless.
It seems to me that the correct definition of exists() in terms of status() is just !(status() & not_found_flag).
The expectation with not_found_flag is that, assuming the branch() if any is found, it should be possible to create a file or directory with that path. Consider: if (!exists("foo")) create_directory("foo"); // surprise! This may fail: "exists but not a directory" That seems pretty strange to me. --Beman

Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:012701c553ea$c2707610$6401a8c0@pdimov2...
It seems to me that the correct definition of exists() in terms of status() is just !(status() & not_found_flag).
The expectation with not_found_flag is that, assuming the branch() if any is found, it should be possible to create a file or directory with that path. Consider:
if (!exists("foo")) create_directory("foo"); // surprise! This may fail: "exists but not a directory"
!exists("foo") == !!(status("foo") & not_found_flag) == status("foo") & not_found_flag if( status("foo") & not_found_flag ) { create_directory( "foo" ); } I don't see your point, but I may be missing something.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:006301c5540d$c2bfe0b0$6401a8c0@pdimov2...
Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:012701c553ea$c2707610$6401a8c0@pdimov2...
It seems to me that the correct definition of exists() in terms of status() is just !(status() & not_found_flag).
The expectation with not_found_flag is that, assuming the branch() if any is found, it should be possible to create a file or directory with that path. Consider:
if (!exists("foo")) create_directory("foo"); // surprise! This may fail: "exists but not a directory"
!exists("foo") == !!(status("foo") & not_found_flag) == status("foo") & not_found_flag
if( status("foo") & not_found_flag ) { create_directory( "foo" ); }
I don't see your point, but I may be missing something.
My point is that if "foo" is a socket or some other non-directory, non-streamable-file, entity, then classifying it as "not_found_flag" is very misleading. Yet status() has to classify it as something. It isn't an error, it isn't not found, it isn't a directory, and it isn't a file (or streamable-file or whatever you call that.) None of the existing categories fit. So a new "other" category would seem indicated. --Beman

Beman Dawes wrote:
My point is that if "foo" is a socket or some other non-directory, non-streamable-file, entity, then classifying it as "not_found_flag" is very misleading.
Of course it won't be classified as not_found. It won't be classified as anything, as it doesn't meet the expectations of any category.
Yet status() has to classify it as something.
Why? A classification only makes sense if it gives us useful information. The "other" category has no other purpose but to support the arbitrary requirement that status() needs to set one and only one bit. It is not unreasonable for some entities to fall in more than one category, or in no category, and the additional requirement that the categories are disjoint and fully cover the domain does not seem to buy us anything.

Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:012701c553ea$c2707610$6401a8c0@pdimov2...
Beman Dawes wrote:
1) Add an other_flag and a matching is_other() function. Change definition of exists() to status() & (directory_flag|file_flag|other_flag)
I should also add that, in my opinion, what you call is_file is probably is_stream. Streams can be read with an ifstream, but they aren't necessarily stable, seekable or finite. Take /dev/urandom, for example. A hypothetical OS that allows directories to be opened with an ifstream may reasonably answer true to both is_stream(p) and is_directory(p). Or maybe not. :-)

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:007401c5540f$efec2380$6401a8c0@pdimov2...
Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:012701c553ea$c2707610$6401a8c0@pdimov2...
Beman Dawes wrote:
1) Add an other_flag and a matching is_other() function. Change definition of exists() to status() & (directory_flag|file_flag|other_flag)
I should also add that, in my opinion, what you call is_file is probably is_stream. Streams can be read with an ifstream, but they aren't necessarily stable, seekable or finite. Take /dev/urandom, for example.
I agree at least in theory. Although people call these things "files" all the time.
A hypothetical OS that allows directories to be opened with an ifstream may reasonably answer true to both is_stream(p) and is_directory(p). Or maybe not. :-)
I think not. Dividing the world into non-overlapping categories is much easier:-) --Beman

Beman Dawes wrote:
A hypothetical OS that allows directories to be opened with an ifstream may reasonably answer true to both is_stream(p) and is_directory(p). Or maybe not. :-)
I think not. Dividing the world into non-overlapping categories is much easier:-)
It's only easier until an example comes up that doesn't fit the scheme, so you are forced to invent meta-categories (also known as "hacks" in some category theory circles) such as "other". :-) "Capability bits" are cleaner. Is it openable with ifstream? file_bit. Is it iteratable with directory_iterator? directory_bit. Does it exist? !not_found_bit. When a socket is thrown your way, you just run it by the list and conclude that no bit needs to be set.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a201c55418$585bb2c0$6401a8c0@pdimov2...
Beman Dawes wrote:
A hypothetical OS that allows directories to be opened with an ifstream may reasonably answer true to both is_stream(p) and is_directory(p). Or maybe not. :-)
I think not. Dividing the world into non-overlapping categories is much easier:-)
It's only easier until an example comes up that doesn't fit the scheme, so you are forced to invent meta-categories (also known as "hacks" in some category theory circles) such as "other". :-)
"Capability bits" are cleaner. Is it openable with ifstream? file_bit. Is it iteratable with directory_iterator? directory_bit. Does it exist? !not_found_bit.
I'm only partially convinced. How useful is knowing that you can ifstream a directory? The only cases I can think of are so system specific that I have trouble seeing them in the context of Boost.Filesystem.
When a socket is thrown your way, you just run it by the list and conclude that no bit needs to be set.
But that is about the same as having an other_flag, except that the name is 0 instead of other_flag. Giving it a name makes it more obvious that a return from status() may not have any of the other bits set --Beman

A hypothetical OS that allows directories to be opened with an ifstream may reasonably answer true to both is_stream(p) and is_directory(p). Or maybe not. :-)
I think not. Dividing the world into non-overlapping categories is much easier:-)
It's only easier until an example comes up that doesn't fit the scheme, so you are forced to invent meta-categories (also known as "hacks" in some category theory circles) such as "other". :-)
"Capability bits" are cleaner. Is it openable with ifstream? file_bit. Is it iteratable with directory_iterator? directory_bit. Does it exist? !not_found_bit.
I'm only partially convinced. How useful is knowing that you can ifstream a directory? The only cases I can think of are so system specific that I have trouble seeing them in the context of Boost.Filesystem.
ReiserFS version 4 supports opening a directory as a file. eg: opendir("/foo") & open("/foo") both succeed. This is so that you can treat a directory as a file, so that you can treat directory entries as 'streams' within the file (cf. win32 named file-streams). Samba 4 is intending on using this facility to map win32 stream-open's to real POSIX file semantics.
When a socket is thrown your way, you just run it by the list and conclude that no bit needs to be set.
But that is about the same as having an other_flag, except that the name is 0 instead of other_flag. Giving it a name makes it more obvious that a return from status() may not have any of the other bits set
However it is implemented, it should be 'named' so that it is obvious to a person reading over the code, that the test is checking for a specific capability. Mathew

On Mon, May 09, 2005 at 12:05:35PM +1000, Mathew Robertson wrote:
I'm only partially convinced. How useful is knowing that you can ifstream a directory? The only cases I can think of are so system specific that I have trouble seeing them in the context of Boost.Filesystem.
ReiserFS version 4 supports opening a directory as a file. eg:
FreeBSD and Solaris both allow opening a directory (read-only) with open(2) and reading from it with read(2) on their default FS types. If you know the format of the directory and want to write a non-portable app that reads it directly you could use an ifstream to do so, in theory. jon

"Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote in message news:20050509082256.GB43951@compsoc.man.ac.uk...
On Mon, May 09, 2005 at 12:05:35PM +1000, Mathew Robertson wrote:
I'm only partially convinced. How useful is knowing that you can ifstream a directory? The only cases I can think of are so system specific that I have trouble seeing them in the context of Boost.Filesystem.
ReiserFS version 4 supports opening a directory as a file. eg:
FreeBSD and Solaris both allow opening a directory (read-only) with open(2) and reading from it with read(2) on their default FS types.
If you know the format of the directory and want to write a non-portable app that reads it directly you could use an ifstream to do so, in theory.
Sure, but it seems like you can only use that facility on a system you already have prior knowledge about. Part of that knowledge could come from ::stat(), either directly or wrapped in a call to boost::filesystem::status(). But if stat() doesn't report both S_ISDIR and S_ISREG true at the same time, there isn't any way to practically implement status(), even if you buy the argument that reporting both would be desirable. --Beman

Beman Dawes wrote:
But if stat() doesn't report both S_ISDIR and S_ISREG true at the same time, there isn't any way to practically implement status(), even if you buy the argument that reporting both would be desirable.
S_ISREG is is_regular_file, not is_stream. /dev/urandom is not a regular file, but it is a stream, IIUC (a "character special file", probably, I don't have a POSIX system at the moment handy). is_file specification aside, an implementation of the standard library is not required to be portable, so it can exploit whatever platform-specific information it wants, unless you want to _specifically disallow_ this in the specification of status().

I'm only partially convinced. How useful is knowing that you can ifstream a directory? The only cases I can think of are so system specific that I have trouble seeing them in the context of Boost.Filesystem.
ReiserFS version 4 supports opening a directory as a file. eg:
FreeBSD and Solaris both allow opening a directory (read-only) with open(2) and reading from it with read(2) on their default FS types.
If you know the format of the directory and want to write a non-portable app that reads it directly you could use an ifstream to do so, in theory.
Sure, but it seems like you can only use that facility on a system you already have prior knowledge about. Part of that knowledge could come from ::stat(), either directly or wrapped in a call to boost::filesystem::status().
Yes that knowledge _could_ come from status(), but for platforms that dont support the named property, the implementation should return something useful as a fallback. However, is_blah() should "do the right thing" on platforms which dont support 'blah'
But if stat() doesn't report both S_ISDIR and S_ISREG true at the same time, there isn't any way to practically implement status(), even if you buy the argument that reporting both would be desirable.
sure there is... dont use stat() - use boost::filesystem::status() then have status() return something that generic code can work with. And the return code from status() doesn't _have to_ be the actual bitmap value returned from S_ISDIR && S_ISREG -> status() can choose to remap this return value into something platform agnostic. If a developer chooses to use stat() - which is not always platform agnostic - thats their own fault. One purpose of using boost::filesystem is to abstract the platform inconsitancies. Mathew

"Mathew Robertson" <mathew.robertson@redsheriff.com> wrote in message news:028101c554fc$b9c0cfc0$a901000a@mat...
Sure, but it seems like you can only use that facility on a system you already have prior knowledge about. Part of that knowledge could come from ::stat(), either directly or wrapped in a call to boost::filesystem::status().
Yes that knowledge _could_ come from status(), but for platforms that dont support the named property, the implementation should return something useful as a fallback.
However, is_blah() should "do the right thing" on platforms which dont support 'blah'
And we have done that where the "right thing" is obvious, such as "false" for is_symlink() on Windows. I'm not against adding more such "do the right thing" functions when we can figure out what the "right thing" is.
But if stat() doesn't report both S_ISDIR and S_ISREG true at the same time, there isn't any way to practically implement status(), even if you buy the argument that reporting both would be desirable.
sure there is... dont use stat() ...
Somehow query for the file system in use, and have a table of what file systems support reading directories? Try to open a known directory for reading? Those don't seem practical to me. But stat() may give the correct results on at least some systems. I'll try Apple OS X to see what happens. --Beman

"Mathew Robertson" <mathew.robertson@redsheriff.com> wrote in message news:066f01c5543b$96a11570$a901000a@mat...
A hypothetical OS that allows directories to be opened with an ifstream may reasonably answer true to both is_stream(p) and is_directory(p). Or maybe not. :-)
I think not. Dividing the world into non-overlapping categories is much easier:-)
It's only easier until an example comes up that doesn't fit the scheme, so you are forced to invent meta-categories (also known as "hacks" in some category theory circles) such as "other". :-)
"Capability bits" are cleaner. Is it openable with ifstream? file_bit. Is it iteratable with directory_iterator? directory_bit. Does it exist? !not_found_bit.
I'm only partially convinced. How useful is knowing that you can ifstream a directory? The only cases I can think of are so system specific that I have trouble seeing them in the context of Boost.Filesystem.
ReiserFS version 4 supports opening a directory as a file. eg:
opendir("/foo") & open("/foo")
both succeed.
This is so that you can treat a directory as a file, so that you can treat directory entries as 'streams' within the file (cf. win32 named file-streams).
After a stat("/foo", &buf), are S_IFDIR(buf.st_mode) and S_IFREG(buf.st_mode) both true? If so, it would be possible to implement status() to return both directory_flag and file_flag. But if stat("/foo", &buf) reports S_IFREG(buf.st_mode) as false, there isn't any practical way to implement both directory_flag and file_flag set at the same time.
Samba 4 is intending on using this facility to map win32 stream-open's to real POSIX file semantics.
When a socket is thrown your way, you just run it by the list and conclude that no bit needs to be set.
But that is about the same as having an other_flag, except that the name is 0 instead of other_flag. Giving it a name makes it more obvious that a return from status() may not have any of the other bits set
However it is implemented, it should be 'named' so that it is obvious to a person reading over the code, that the test is checking for a specific capability.
Agreed. That's part of the point I'm trying to make with Peter. Thanks for the report! I would really appreciate it if you could test how stat() is behaving in the case above. --Beman

Beman Dawes wrote:
"Mathew Robertson" <mathew.robertson@redsheriff.com> wrote in message news:066f01c5543b$96a11570$a901000a@mat...
However it is implemented, it should be 'named' so that it is obvious to a person reading over the code, that the test is checking for a specific capability.
Agreed. That's part of the point I'm trying to make with Peter.
The is_other test _isn't checking_ for a specific capability! That's the whole point!

However it is implemented, it should be 'named' so that it is obvious to a person reading over the code, that the test is checking for a specific capability.
Agreed. That's part of the point I'm trying to make with Peter.
The is_other test _isn't checking_ for a specific capability! That's the whole point!
apologies... when I said "... the test is checking for a specific capablity." it came across wrong. What I meant was, if we have code like: if (is_file("/some/path/")) { ... } [ where "/some/path/" is a directory in the traditional sense ] The 'checking' is not "does the capability exist" -> rather "is the capability true", where false would be a result of the platform not supporing a valid implementation of is_file() or it isn't an actual file. That said, these is_blah()'s are really tri-state values - not boolean. Should their specification be: int is_file(const char *) where the return values can be: -1 error, and errno is set with the error value 0 false 1 true ? Mathew

"Mathew Robertson" <mathew.robertson@redsheriff.com> wrote in message news:029001c554fe$1664d360$a901000a@mat...
However it is implemented, it should be 'named' so that it is obvious to a person reading over the code, that the test is checking for a specific capability.
Agreed. That's part of the point I'm trying to make with Peter.
The is_other test _isn't checking_ for a specific capability! That's the whole point!
apologies... when I said "... the test is checking for a specific capablity." it came across wrong.
What I meant was, if we have code like:
if (is_file("/some/path/")) { ... }
[ where "/some/path/" is a directory in the traditional sense ]
The 'checking' is not "does the capability exist" -> rather "is the capability true", where false would be a result of the platform not supporing a valid implementation of is_file() or it isn't an actual file.
That said, these is_blah()'s are really tri-state values - not boolean. Should their specification be:
int is_file(const char *)
where the return values can be:
-1 error, and errno is set with the error value 0 false 1 true
Well, that is what is intended, although for the is_blah() functions the error is communicated by throwing an exception. status() provides the functionality for users who want an explicit error return (with error code) rather than an exception. --Beman

Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a201c55418$585bb2c0$6401a8c0@pdimov2...
When a socket is thrown your way, you just run it by the list and conclude that no bit needs to be set.
But that is about the same as having an other_flag, except that the name is 0 instead of other_flag.
You don't need a name because the user will never test for this category. if( status() & other_flag ) { // OK, so what do I do here? } The other categories have expectations attached, so testing for them makes sense.
Giving it a name makes it more obvious that a return from status() may not have any of the other bits set.
I, as a user, wouldn't rely on the "one bit rule" anyway. Idiomatic code should work with any result from status(), as long as it makes sense. It is reasonable to expect a future status() to return readable_flag and writable_flag in addition to exists_flag (currently named !not_found_flag to preserve the one bit rule.) As others have pointed out, a bitmask type is not used if one and only one bit will be set. :-)

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:005f01c5547e$3e22aec0$6401a8c0@pdimov2...
Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a201c55418$585bb2c0$6401a8c0@pdimov2...
When a socket is thrown your way, you just run it by the list and conclude that no bit needs to be set.
But that is about the same as having an other_flag, except that the name is 0 instead of other_flag.
You don't need a name because the user will never test for this category.
if( status() & other_flag ) { // OK, so what do I do here? }
The other categories have expectations attached, so testing for them makes sense.
What you might do in the code above is report the fact that an "other" has been discovered, or execute a fallback procedure, or whatever.
Giving it a name makes it more obvious that a return from status() may not have any of the other bits set.
I, as a user, wouldn't rely on the "one bit rule" anyway. Idiomatic code should work with any result from status(), as long as it makes sense. It is reasonable to expect a future status() to return readable_flag and writable_flag in addition to exists_flag (currently named !not_found_flag to preserve the one bit rule.)
It is fine with me if status() returns multiple bits set, such as your writeable/readable example. But I prefer each of file type outcomes to have a specific name. --Beman

Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:005f01c5547e$3e22aec0$6401a8c0@pdimov2...
You don't need a name because the user will never test for this category. if( status() & other_flag ) { // OK, so what do I do here? }
The other categories have expectations attached, so testing for them makes sense.
What you might do in the code above is report the fact that an "other" has been discovered, or execute a fallback procedure, or whatever.
A fallback only makes sense if a test fails, not when it succeeds. int r = status(); if( r & directory_flag ) { // directory } else if( r & file_flag ) { // implementation defined, but in general file-like } else { // fallback } Note that the code is resilient to future changes introducing other categories. if( r & other_flag ) { // fallback? } When the "other" category is split to "device" and "other other", the meaning of the above code changes. It no longer falls back when it encounters a device, but it did before. Explicit tests for "other" or "unknown" are evil and will not pass a sensible code review anyway.
participants (4)
-
Beman Dawes
-
Jonathan Wakely
-
Mathew Robertson
-
Peter Dimov