filesystem exists/is_directory suggestion

The filesystem status operations exists & is_directory will both throw if the status of the file can't be determined (access problems or race conditions). I think this makes the operations overly complicated to use since you need to enclose them in try catch blocks. You normally guard operations that use files in some way but it is easy to forget try/catch around exists / is_directory. (Noticed that exists has been updated in CVS to return true on error which is probably ok but I think is_directory need to be updated as well.) I think a good solution would be to have a non-throwing overload where the return value in case of error is specified. bool is_directory(const tdFS::path& aPath, bool aErrorRet) { try { return tdFS::is_directory(aPath); } catch (...) { return aErrorRet; } } Another way could be to have a "is_accessable" operation that determines if the status of the entry can be determined but it will not solve the racing problem. usage: for (directory_iterator itr(mypath); itr != directory_iterator(); ++itr) if (is_directory(*itr, true)) continue; // skip directories or for (directory_iterator itr(mypath); itr != directory_iterator(); ++itr) if (!is_accessable(*itr) || is_directory(*itr)) continue; instead of for (directory_iterator itr(mypath); itr != directory_iterator(); ++itr) try { if (is_directory(*itr)) continue; } catch (...) { continue; }

At 05:12 AM 9/9/2004, Martin wrote:
The filesystem status operations exists & is_directory will both throw if
the status of the file can't be determined (access problems or race conditions).
CVS for exists() was changed in March to always determine existence or no, and not to throw.
I think this makes the operations overly complicated to use since you need to enclose them in try catch blocks. You normally guard operations that use files in some way but it is easy to forget try/catch around exists / is_directory.
That isn't at all clear to me. I've rarely if ever seen people put try/catch around is_directory(). The thinking is that throws from is_directory all represent real errors, so should be caught higher up in the call tree along with all other general filesystem errors.
(Noticed that exists has been updated in CVS to return true on error
is probably ok but I think is_directory need to be updated as well.)
I think a good solution would be to have a non-throwing overload where
which the
return value in case of error is specified.
bool is_directory(const tdFS::path& aPath, bool aErrorRet) { try { return tdFS::is_directory(aPath); } catch (...) { return aErrorRet; } }
Another way could be to have a "is_accessable" operation that determines if the status of the entry can be determined but it will not solve the racing problem.
Yes, is_accessable() might be useful. But the use cases I can think of are a bit theoretical. Anyone else have an opinion? I think I'd like to hear about so real-life needs (rather than theoretical or hypothetical needs) before deciding. Thanks, --Beman

On Thu, Sep 09, 2004 at 03:32:28PM -0400, Beman Dawes wrote:
Yes, is_accessable() might be useful. But the use cases I can think of are a bit theoretical. Anyone else have an opinion? I think I'd like to hear about so real-life needs (rather than theoretical or hypothetical needs) before deciding.
It makes sense. The only reasonable result for is_directory() when a parent directory in the path is not reachable, is to return false or throwing. But you can't return 'false' because it is possible that the path really is a directory OR it doesn't exist even; and the policy is to thrwo when it doesn't exist. You also can't throw because, like you said, that should be a real error; I don't think that inaccessibility for a mere query should be considered a serious error of this type. My conclusion is that throwing is the only option, but that the coder should get a fair chance of avoiding the need to catch that exception by providing the additional test is_accessable. is_accessable then should return false when it is not possible to determine the existance and/or type of the path passed to it. I understand that the original proposal also tried to solve a possible race condition. For example, code now will look like: if (is_accessable(p) && is_directory(p)) in an attempt to avoid exceptions. However, it is possible that right after the call to is_accessable - a parent directory is made in-accessable; in which case you still get an exception; that doesn't seem right I have to admit. Not that I'd care ;) Thanks -- Carlo Wood <carlo@alinoe.com>

Yes, is_accessable() might be useful. But the use cases I can think of are a bit theoretical. Anyone else have an opinion? I think I'd like to hear about so real-life needs (rather than theoretical or hypothetical needs) before deciding.
My real life need is that in Windows there are some files in the root directory that you can't query the attributes for. Haven't checked the exact conditions but is_directory will throw for me when iterating over c:\. My concern is that others might do the same mistake as I did and forget to specificly put try/catch around is_directory when iterating a directory.

At 10:52 AM 9/10/2004, Martin wrote:
My real life need is that in Windows there are some files in the root directory that you can't query the attributes for. Haven't checked the exact conditions but is_directory will throw for me when iterating over c:\.
pagefile.sys is one case. There may be others. That does seem like a practical example; I know I've run into it. OK, assume we want to add is_accessible(). What are the specifications? Perhaps: Returns: true if exists(ph) and the attributes of ph can successfully be queried. Note: Certain files or directories exist, but the user may not have the proper permissions to access them. For example, on Windows, ordinary users do not have permission to access the attributes of pagefile.sys. Thus in the directory containing pagefile.sys, exists("pagefile.sys") is true but is_accessible("pagefile.sys") is false. An attempt to query the attributes, such as is_directory("pagefile.sys") would throw an exception, which can be avoided by checking is_accessible("pagefile.sys"). The code might look something like: for (directory_iterator itr(mypath); itr != directory_iterator(); ++itr) { // ignore directories if (!is_accessable(*itr) || is_directory(*itr)) continue; ... Comments? --Beman

At 02:19 PM 9/10/2004, Peter Dimov wrote:
Beman Dawes wrote:
An attempt to query the attributes, such as is_directory("pagefile.sys") would throw an exception, which can be avoided by checking is_accessible("pagefile.sys").
I'm not sure why would anyone want to omit the is_accessible check before
calling is_directory.
In software which is specified to cope with even rare eventualities without recourse to a general error catch, it probably shouldn't be omitted. In applications which run in a very controlled and tightly specified environment, anything not being accessible is virtually certain to be an error condition, so just letting is_directory() throw is best for those apps. Another question is which other query functions also need to be protected by an is_accessible() test in some applications? --Beman

Beman Dawes wrote:
At 02:19 PM 9/10/2004, Peter Dimov wrote:
Beman Dawes wrote:
An attempt to query the attributes, such as is_directory("pagefile.sys") would throw an exception, which can be avoided by checking is_accessible("pagefile.sys").
I'm not sure why would anyone want to omit the is_accessible check before calling is_directory.
In software which is specified to cope with even rare eventualities without recourse to a general error catch, it probably shouldn't be omitted. In applications which run in a very controlled and tightly specified environment, anything not being accessible is virtually certain to be an error condition, so just letting is_directory() throw is best for those apps.
This sounds convincing. However if we get back to our real-world example, what do I gain from is_directory("pagefile.sys") throwing an exception? We know that this will happen. We also know that pagefile.sys is not a directory. A related issue is that if we had a recursive (or a richer) iterator, the problem wouldn't occur because iteration with FindNextFile will give us the attributes (0x00000026 for pagefile.sys on my machine.) It's also possible to use FindFirstFile to query the attributes, of course.
participants (4)
-
Beman Dawes
-
Carlo Wood
-
Martin
-
Peter Dimov