[Filesystem] i18n

Martin

30 Mar 2005 30 Mar '05

6:18 p.m.

Didn't find any comments on the new filesystem implementation so here are mine - Would it be possible to store the new files in the vault? I find it inconvenient to check out a special branch just to read the documentation. - I haven't compiled the new files but I have had a good look at the documentation and the source. My impression is that it is a very nice implementation. (All the #ifdefs make the source hard to follow but I'm not the maintainer) - Specially I like the switch from default portable syntax to default native syntax. It means that I no longer need to change the default name checker in each application and that the "native_" prefix from the access functions are gone. (Maybe the name checker could be a template parameter (with default "none") for those who still want to use it. Would complicate implementation of operations.) The comments I have on the new implementation is: - Operations are limited to std::string (and std::wstring on win32) as external type. Why not allow any range of characters? Current implementation will not work with const_string, flex_string and a basic_string with a custom allocator. - Why not have a static locale member in the class instead of using the utf-8 facet. The locale could then be used both for internal/external conversion via the codecvt facet and get operator< to work with a custom locale. (for posix the locale would default to locale::global() with the utf-8 facet added) I also had hopes for some new things that didn't appear in the new implementation. - non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file. The throwing is_* functions also makes it impossible(?) to use the directory iterator with algorithms like remove_copy_if(directory_iterator(path), directory_iterator(), back_inserter(filelist), is_directory) My suggestion is to add non-throwing overloads where a second parameter tells what the function should return in case of error. e.g. is_directory(path, true). - The directory iterator is still very limited since you can't specify a filter (e.g. "*.txt"). If used on a win32 system with a mounted posix disk or a posix system with a CD problems might arise. (There is no portable way to tell if "FILE.TXT" matches "*.txt"). - Why doesn't last_write_time return a boost::ptime

Show replies by date

Beman Dawes

30 Mar 30 Mar

8:16 p.m.

"Martin" <adrianm@touchdown.se> wrote in message news:loom.20050330T110611-38@post.gmane.org...

...

Didn't find any comments on the new filesystem implementation so here are mine

- Would it be possible to store the new files in the vault? I find it inconvenient to check out a special branch just to read the documentation.

I'll do that after then next round of updates.

...

- I haven't compiled the new files but I have had a good look at the documentation and the source. My impression is that it is a very nice implementation. (All the #ifdefs make the source hard to follow but I'm not the maintainer)

Thanks. The #ifdef situation has improved since the last release in that there are fewer #ifdefs covering bigger blocks of code, and in some cases overloading has replaced #ifdefs. The only other thing I can think of would be to have separate POSIX and Windows implementation files. But that brings up code duplication issues. Changes to allow the code to work with broken compilers is also likely to introduce more #ifdefs. It's messy, that's for sure.

...

- Specially I like the switch from default portable syntax to default native syntax. It means that I no longer need to change the default name checker in each application and that the "native_" prefix from the access functions are gone. (Maybe the name checker could be a template parameter (with default "none") for those who still want to use it. Would complicate implementation of operations.)

I thought of that, but decided the symplification from removing the name checking from the basic_path class was more important.

...

The comments I have on the new implementation is: - Operations are limited to std::string (and std::wstring on win32) as external type. Why not allow any range of characters? Current implementation will not work with const_string, flex_string and a basic_string with a custom allocator.

That isn't correct. See lpath.hpp and wide_test.cpp in the test directory for an example of using a basic_string<long> as the the underlying string type. The docs need more work to make clearer exactly what is required of the String argument to the basic_path template. A programmer wishing to basic_path with something other than std::string or std::wstring has to do more work, but it isn't particularly difficult, as the lpath example shows.

...

- Why not have a static locale member in the class instead of using the utf-8 facet. The locale could then be used both for internal/external conversion via the codecvt facet and get operator< to work with a custom locale. (for posix the locale would default to locale::global() with the utf-8 facet added)

I'll give that some thought.

...

I also had hopes for some new things that didn't appear in the new implementation. - non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file.

If the is_* functions return false, rather than throwing, then testing for the negative of a is_* function becomes unreliable. Does !is_directory( "foo" ) mean that "foo" is a file, or is is just missing? The programmer would have to write exists("foo") && is_accessible("foo") && !is_directory("foo"). That could be simplified by providing an is_file() function (as suggested by Jeff Garland recently), but then we have to define exactly what a file is. Is a socket a file? Is a device a file? Or what POSIX calls a "regular file"? How does that translate to Windows? I'm not saying that approach would be a bad thing. In fact, I was just working out examples of how it might work this morning. But it is a considerable change. Those changes would have to be worked out in detail so we could evaluate them compared to the status quo.

...

The throwing is_* functions also makes it impossible(?) to use the directory iterator with algorithms like remove_copy_if(directory_iterator(path), directory_iterator(), back_inserter(filelist), is_directory)

While that is an interesting example, note that a user provided function would work instead.

...

My suggestion is to add non-throwing overloads where a second parameter tells what the function should return in case of error. e.g. is_directory(path, true).

Interesting. That would cover my example about of wanting to detect a file. The expressions would be written !is_directory("foo",true). But that seems a bit obscure to me, and likely to be a cause of coding errors. Hum... it isn't even correct in the case a directory exists but is not accessible An alternative would be named functions. These would be the so called "compound" functions which were discussed at one time, but that gets really messy. Where do you stop?

...

- The directory iterator is still very limited since you can't specify a filter (e.g. "*.txt"). If used on a win32 system with a mounted posix disk or a posix system with a CD problems might arise. (There is no portable way to tell if "FILE.TXT" matches "*.txt").

There was discussion of a glob directory iterator at one time, but no one ever came up with a final design. It is certainly a real need.

...

- Why doesn't last_write_time return a boost::ptime

That was a concious decision to limit coupling. That is, to avoid Boost libraries which are not part of TR1. I'd like the Boost version to be identical to what gets proposed to the standards committee, and unless Boost Date-Time gets proposed and accepted, that means it can't be used. Thanks for your comments. Because of the upcoming standards committee meeting, I'll be pretty distracted for the next three weeks, but will try to give some more thought at odd moments to the issues you've raised. --Beman

Martin

31 Mar 31 Mar

7:23 a.m.

...

...
The comments I have on the new implementation is: - Operations are limited to std::string (and std::wstring on win32) as external type. Why not allow any range of characters? Current implementation will not work with const_string, flex_string and a basic_string with a custom allocator.

That isn't correct. See lpath.hpp and wide_test.cpp in the test directory for an example of using a basic_string<long> as the the underlying string type. Yes, you can use any string type as _internal_ type but since the external type must be std::string you have to supply a conversion function to std::string in the traits (like in your lpath_traits). If the external type was a character range the same traits could be used for many string-like types and there would be no performance hit when converting.

Would also be nice if the path constructor and "/" operator worked with ranges so you could mix all string types with the same underlying character type.

...

...
My suggestion is to add non-throwing overloads where a second parameter tells what the function should return in case of error. e.g. is_directory(path, true).

Interesting. That would cover my example about of wanting to detect a file. The expressions would be written !is_directory("foo",true). But that seems a bit obscure to me, and likely to be a cause of coding errors. Hum... it isn't even correct in the case a directory exists but is not accessible An alternative would be named functions. These would be the so called "compound" functions which were discussed at one time, but that gets really messy. Where do you stop?

I assume that the most common use of the is_* function is as a filter. With an extra argument you specify if errors mean that the path should be excluded. For clarity you could use an enum like "treat_error_as_true" instead of the bool value.

...

...
- The directory iterator is still very limited since you can't specify a filter (e.g. "*.txt"). If used on a win32 system with a mounted posix disk or a posix system with a CD problems might arise. (There is no portable way to tell if "FILE.TXT" matches "*.txt").

There was discussion of a glob directory iterator at one time, but no one ever came up with a final design. It is certainly a real need.

All suggestions of a glob directory iterator I have seen works only on the path. They ignore that fact that files can be both case sensitive and case- insensitive on the same system.

Jeff Garland

11:46 a.m.

On Wed, 30 Mar 2005 15:16:07 -0500, Beman Dawes wrote

...

"Martin" <adrianm@touchdown.se> wrote in message

Couple thoughts below....

...

...
I also had hopes for some new things that didn't appear in the new implementation. - non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file.

If the is_* functions return false, rather than throwing, then testing for the negative of a is_* function becomes unreliable. Does !is_directory( "foo" ) mean that "foo" is a file, or is is just missing? The programmer would have to write exists("foo") && is_accessible("foo") && !is_directory("foo").

Why not: enum exists_check { CHECK, NO_CHECK }; bool is_directory( path p, exists_check = CHECK); Always returns false if directory doesn't exist...

...

That could be simplified by providing an is_file() function (as suggested by Jeff Garland recently), but then we have to define exactly what a file is. Is a socket a file? Is a device a file? Or what POSIX calls a "regular file"? How does that translate to Windows?

I think I already said this, but just to be sure:

...

Is a socket a file?

...

Is a device a file?

...

Or what POSIX calls a "regular file"?

Yes

...

How does that translate to Windows?

Since there are no simlinks is_file() == !is_directory().

...

...
- Why doesn't last_write_time return a boost::ptime

That was a concious decision to limit coupling. That is, to avoid Boost libraries which are not part of TR1. I'd like the Boost version to be identical to what gets proposed to the standards committee, and unless Boost Date-Time gets proposed and accepted, that means it can't be used.

This can be trivially implemented in convenience functions since date-time has added a from_time_t function for posix_time. Basically you just need a thin veneer over the current functions: #include "boost/date_time/posix_time/conversion.hpp" //sorry can't think of a good function name.... inline boost::posix_time::ptime last_write_time_as_ptime( const path & ph ) { using boost::posix_time; return from_time_t(last_write_time( const path & ph )); } I'm not suggesting Beman add this to filesystem core -- just pointing out that it can be done trivially with an add-on. Jeff

Peter Dimov

1:10 p.m.

Beman Dawes wrote:

...

If the is_* functions return false, rather than throwing, then testing for the negative of a is_* function becomes unreliable.

On the contrary, it becomes reliable. It's currently undefined.

...

Does !is_directory( "foo" ) mean that "foo" is a file, or is is just missing?

!is_directory( "foo" ) means that "foo" is not a directory, i.e. directory operations on "foo" will fail. It cannot be used as a substitute for "is a file", for any reasonable definition of "is a file".

...

The programmer would have to write exists("foo") && is_accessible("foo") && !is_directory("foo"). That could be simplified by providing an is_file() function (as suggested by Jeff Garland recently), but then we have to define exactly what a file is. Is a socket a file? Is a device a file? Or what POSIX calls a "regular file"? How does that translate to Windows?

It depends on your definition of "file". In general, is_X( "foo" ) should return true when "foo" can be treated as a X, that is, you can call X-related functions on "foo". One example could be is_data_stream( "foo" ), which would mean that you can fopen( "foo" ) and then fread it. Or is_seekable_stream, which also includes fseek. Note that on platforms where a directory can be opened with fopen, is_directory and is_data_stream will both return true. The translation of is_regular_file to Windows is trivial, if a bit time-consuming: enumerate all properties of a POSIX regular file (that are observable to us); everything that matches this description is a "regular file", not only on Windows, but on an arbitrary file system.

Rob Stewart

2:56 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

"Martin" <adrianm@touchdown.se> wrote in message news:loom.20050330T110611-38@post.gmane.org...

...
I also had hopes for some new things that didn't appear in the new implementation. - non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file.

If the is_* functions return false, rather than throwing, then testing for the negative of a is_* function becomes unreliable. Does !is_directory( "foo" ) mean that "foo" is a file, or is is just missing? The programmer

Why should it answer both questions? It means it isn't a directory. If you want to know more, ask other questions like is_socket("foo"), exists("foo"), etc.

...

...
My suggestion is to add non-throwing overloads where a second parameter tells what the function should return in case of error. e.g. is_directory(path, true).

Interesting. That would cover my example about of wanting to detect a file. The expressions would be written !is_directory("foo",true). But that seems a bit obscure to me, and likely to be a cause of coding errors. Hum... it

It's also unnecessary. To detect a file, you'd write is_file("foo"). (How you implement is_file() is another matter.)

...

isn't even correct in the case a directory exists but is not accessible An alternative would be named functions. These would be the so called "compound" functions which were discussed at one time, but that gets really messy. Where do you stop?

Compound functions are appropriate when you throw away information that you otherwise get in one or two OS calls, such that the query becomes atomic. However, you don't have to have separate functions for each combination. Instead, you could define a series of properties that can be queried -- most likely as an enumerated type with binary values -- that can be bitwise OR'ed to get the set of properties the caller wants you to query. If all such properties are set, then the function returns true. Such an approach means many current functions might be implemented in terms of the omnibus function, and it means that library users can define domain-specific, useful queries into named functions of their choice. You don't have to supply everything that way, but you supply the means to do so. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

3:57 p.m.

At 09:56 AM 3/31/2005, Rob Stewart wrote:

...

Compound functions are appropriate when you throw away information that you otherwise get in one or two OS calls, such that the query becomes atomic. However, you don't have to have separate functions for each combination. Instead, you could define a series of properties that can be queried -- most likely as an enumerated type with binary values -- that can be bitwise OR'ed to get the set of properties the caller wants you to query. If all such properties are set, then the function returns true.

Such an approach means many current functions might be implemented in terms of the omnibus function, and it means that library users can define domain-specific, useful queries into named functions of their choice. You don't have to supply everything that way, but you supply the means to do so.

That works well for or'ed queries, and is pretty good for more complex queries. It can be inefficient if multiple operating system calls are required to build up the full set of attributes, but maybe that could be contained by careful attribute selection. I'll give it some thought. Thanks for the suggestion, --Beman

Peter Dimov

30 Mar 30 Mar

10:28 p.m.

Martin wrote:

...

- non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file.

Did is_accessible get added, after all? That's bad. Let's remove it before it gets standardized and it's too late. :-)

Beman Dawes

31 Mar 31 Mar

2:53 a.m.

At 05:28 PM 3/30/2005, Peter Dimov wrote:

...

Martin wrote:

...
- non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file.

Did is_accessible get added, after all?

Yes.

...

That's bad. Let's remove it before it gets standardized and it's too late. :-)

? --Beman

Peter Dimov

11:02 a.m.

Beman Dawes wrote:

...

At 05:28 PM 3/30/2005, Peter Dimov wrote:

...
Martin wrote:

...
- non-throwing is_* functions. I find it very inconvenient and error prune to put try/catch around each is_directory call. Calling is_accessible before every other is_* function is an option but not much easier and still leaves a risk for an exception if another process is accessing the file.

Did is_accessible get added, after all?

Yes.

...
That's bad. Let's remove it before it gets standardized and it's too late. :-)

?

It is a hack. Only "useful" on Windows, and only because of implementation details. It is not clear what is_accessible actually does, or why it should be used. It is not referenced anywhere else. The implementation does not reflect the description; on Windows it maps to GetFileAttributes and returns false if that fails, but for some "inaccessible" files it is possible to obtain the attributes using FindFirstFile. The latter is particularly relevant in directory iteration loops; every item in such an iteration is "accessible" because the act of iteration gives you the attributes on our supported platforms. Yet is_accessible returns false for some of the items. I may be wrong, of course.

Beman Dawes

3:44 p.m.

At 06:02 AM 3/31/2005, Peter Dimov wrote:

...

...

...
...
Did is_accessible get added, after all?

Yes.

...
That's bad. Let's remove it before it gets standardized and it's too late. :-)

?

It is a hack. Only "useful" on Windows, and only because of implementation details. It is not clear what is_accessible actually does, or why it should

be used. It is not referenced anywhere else. The implementation does not reflect the description; on Windows it maps to GetFileAttributes and returns false if that fails, but for some "inaccessible" files it is possible to obtain the attributes using FindFirstFile.

The latter is particularly relevant in directory iteration loops; every item in such an iteration is "accessible" because the act of iteration gives you the attributes on our supported platforms. Yet is_accessible returns false for some of the items.

I may be wrong, of course.

Well, you are certainly right that there are some issues that need to be addressed. Because is_accessible() is an attempt to deal with certain problems of exists(), let's go back to the original problem with exists(). Some errors clearly indicate the path is not present, so exists() needs to return false. On POSIX, those would be: [ENOENT] A component of path does not name an existing file or path is an empty string. [ENOTDIR] A component of the path prefix is not a directory. One POSIX error would indicate the path does exist: [EOVERFLOW] The file size in bytes or the number of blocks allocated to the file or the file serial number cannot be represented correctly in the structure pointed to by buf. But with the following errors I can't see any way to know if the path actually exists or not: [EACCES] Search permission is denied for a component of the path prefix. [EIO] An error occurred while reading from the file system. [ELOOP] A loop exists in symbolic links encountered during resolution of the path argument. [ENAMETOOLONG] The length of the path argument exceeds {PATH_MAX} or a pathname component is longer than {NAME_MAX}. If you agree with that analysis, then it looks like exists() should be changed to throw on those last four errors. is_accessible() would have a role in that it would return false on those same four errors because regardless of whether or not the path exists, it clearly isn't accessible. And each of the places exists() is used in the operations docs for other functions needs to be examined to see if is_accessible() is really the requirement. On Windows the analysis is more difficult because the exact set of possible errors is not specified, and because of the FindFirstFile backdoor. Let's get a reaction to my POSIX analysis above before worrying about Windows. --Beman

Peter Dimov

4:17 p.m.

Beman Dawes wrote:

...

Well, you are certainly right that there are some issues that need to be addressed.

Because is_accessible() is an attempt to deal with certain problems of exists(), let's go back to the original problem with exists(). Some errors clearly indicate the path is not present, so exists() needs to return false. On POSIX, those would be:

[ENOENT] A component of path does not name an existing file or path is an empty string.

[ENOTDIR] A component of the path prefix is not a directory.

One POSIX error would indicate the path does exist:

[EOVERFLOW] The file size in bytes or the number of blocks allocated to the file or the file serial number cannot be represented correctly in the structure pointed to by buf.

But with the following errors I can't see any way to know if the path actually exists or not:

[EACCES] Search permission is denied for a component of the path prefix. [EIO] An error occurred while reading from the file system.

[ELOOP] A loop exists in symbolic links encountered during resolution of the path argument.

[ENAMETOOLONG] The length of the path argument exceeds {PATH_MAX} or a pathname component is longer than {NAME_MAX}.

If you agree with that analysis, then it looks like exists() should be changed to throw on those last four errors.

Before I agree with your analysis, I need to know what is the definition of "exists" with respect to the observable behavior of the filesystem:: and std:: functions that can operate on the argument. This is not a theoretical issue. The meaning of exists( p ) is only relevant in this context. As a user of the library, I need to know what do 'true' and 'false' mean with respect to the program logic, which consists of calls to the filesystem:: and std:: functions. Otherwise, the return value of exists( p ) won't be of any use (except that it can be printed to the screen.)

Alf P. Steinbach

4:35 p.m.

Beman Dawes wrote:

...

... with the following errors I can't see any way to know if the path actually exists or not:

[EACCES] Search permission is denied for a component of the path prefix.

[EIO] An error occurred while reading from the file system.

[ELOOP] A loop exists in symbolic links encountered during resolution of the path argument.

[ENAMETOOLONG] The length of the path argument exceeds {PATH_MAX} or a pathname component is longer than {NAME_MAX}.

If you agree with that analysis, then it looks like exists() should be changed to throw on those last four errors.

Looks to me that 'exists()' must yield a tri-state result if its going to be kept with existing name. Some questions cannot be answered with only yes or no. An alternative to tri-state is to split the thing into two or more functions. I vaguely remember seeing some documentation about this where the current interface seemed obscure to me, the client code programmer having to keep in mind a table of possible combinations of results and what they would mean. Sorry I'm too lazy to look into this again, but if that vague recollection is correct, then I suggest that the elements in that table indicate the functions that _should_ be there.

Rob Stewart

6:06 p.m.

From: Beman Dawes <bdawes@acm.org>

...

Because is_accessible() is an attempt to deal with certain problems of exists(), let's go back to the original problem with exists(). Some errors clearly indicate the path is not present, so exists() needs to return false. On POSIX, those would be:

[ENOENT] A component of path does not name an existing file or path is an empty string.

[ENOTDIR] A component of the path prefix is not a directory.

Right.

...

One POSIX error would indicate the path does exist:

[EOVERFLOW] The file size in bytes or the number of blocks allocated to the file or the file serial number cannot be represented correctly in the structure pointed to by buf.

Right.

...

But with the following errors I can't see any way to know if the path actually exists or not:

[EACCES] Search permission is denied for a component of the path prefix.

Peter's query is key here: what does it mean for exist() to return true or false? You could say that returning false here is appropriate because, as far as the current user is concerned, the file doesn't exist.

...

[EIO] An error occurred while reading from the file system.

Is this a permanent failure? If so, it is reasonable to return false: at this time, the file doesn't exist, even if the intention is that it does. If it is a temporary failure, then a retry might succeed in finding the real answer. In that case, it is appropriate that the caller know that a retry is in order.

...

[ELOOP] A loop exists in symbolic links encountered during resolution of the path argument.

While that is an error condition, it doesn't prevent declaring that the file, as given by the supplied pathname, doesn't exist.

...

[ENAMETOOLONG] The length of the path argument exceeds {PATH_MAX} or a pathname component is longer than {NAME_MAX}.

The pathname clearly doesn't exist because it is invalid. Whether the file the caller is interested in actually exists is a separate question. You can only answer whether the supplied pathname refers to an existing file. If the caller wants to know whether a pathname is valid and names an existing file, isn't that a different question (or set of questions?).

...

If you agree with that analysis, then it looks like exists() should be changed to throw on those last four errors. is_accessible() would have a

I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

...

role in that it would return false on those same four errors because regardless of whether or not the path exists, it clearly isn't accessible.

Accessibility involves ACLs and permissions as well as existence. It is a more restrictive query, at least in my mind. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

11:25 p.m.

At 01:06 PM 3/31/2005, Rob Stewart wrote:

...

From: Beman Dawes <bdawes@acm.org>

...
...

...
But with the following errors I can't see any way to know if the path actually exists or not:

[EACCES] Search permission is denied for a component of the path prefix.

Peter's query is key here: what does it mean for exist() to return true or false? You could say that returning false here is appropriate because, as far as the current user is concerned, the file doesn't exist.

I also think that Peter's query is key. To try to answer that, I looked at about 25 uses of exists() in .cpp files in the boost/tools hierarchy. Perhaps a third of the uses would still make total sense if [EACCES] was treated as false. Perhaps another third of the uses (like using exists() to see if create_directories() needs to be called) might technically be harmed returning false, but no harm would actually be done because some other call (create_directories() in the above case) would soon fail. But perhaps a third of uses were simply to tell if some file system entry was present as part of determining control flow within the code. There wasn't an immediate use of the path after the exists() test, so it is very hard to tell what the effect of returning false would be. Even when you can see the immediate affect, it isn't always clear what is best. For example, link_check.cpp uses exist() tell if an HTML link is broken. Seems to me [EACCES] is an error condition - the link_check is being run with the wrong permissions or something.

...

...
[EIO] An error occurred while reading from the file system.

Is this a permanent failure?

There is no way to tell. POSIX doesn't give any indication. So it seems to me it should be treated as an error causing exception.

...

If so, it is reasonable to return false: at this time, the file doesn't exist, even if the intention is that it does. If it is a temporary failure, then a retry might succeed in finding the real answer. In that case, it is appropriate that the caller know that a retry is in order.

...
[ELOOP] A loop exists in symbolic links encountered during resolution of the path argument.

While that is an error condition, it doesn't prevent declaring that the file, as given by the supplied pathname, doesn't exist.

While it doesn't prevent us declaring it doesn't exist, something is clearly rotten so I'm much more comfortable throwing an exception (or doing something else that doesn't gloss over the fact that a possilbe error has occurred.)

...

...
[ENAMETOOLONG] The length of the path argument exceeds {PATH_MAX} or a pathname component is longer than {NAME_MAX}.

The pathname clearly doesn't exist because it is invalid.

What if the pathname was too long because of a long prefix of "foo/../foo/../foo/../foo" eventually ending in a valid "/real-file"?

...

Whether the file the caller is interested in actually exists is a separate question. You can only answer whether the supplied pathname refers to an existing file.

If the caller wants to know whether a pathname is valid and names an existing file, isn't that a different question (or set of questions?).

...
If you agree with that analysis, then it looks like exists() should be changed to throw on those last four errors. is_accessible() would have a

I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

But if it isn't possible to reliably answer that query, shouldn't it be an error? OTOH, if there is a way (perhaps a status() function returning a bitmap as you suggested in an earlier post) to detect these probably rare error cases, then it would seem more acceptable for exists() to treat these errors as false. An application which had more stringent requirements can avoid exists() and use an explicit test of status() results. --Beman

Peter Dimov

1 Apr 1 Apr

12:17 a.m.

Beman Dawes wrote:

...

At 01:06 PM 3/31/2005, Rob Stewart wrote:

...

...
I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

But if it isn't possible to reliably answer that query, shouldn't it be an error?

The answer to this question is probably "no", if by error you mean exception, and here is why. Every exception-throwing function has spawned a predicate that is used to avoid the exception (because exceptions cancel the currently active operation and this is undesirable in some cases). If you make 'exists' throw an exception, you will create a demand for a 'preexists' query that will be used to detect whether the actual 'exists' will throw.

David Abrahams

1:51 a.m.

"Peter Dimov" <pdimov@mmltd.net> writes:

...

Beman Dawes wrote:

...
At 01:06 PM 3/31/2005, Rob Stewart wrote:

...
...
I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

But if it isn't possible to reliably answer that query, shouldn't it be an error?

The answer to this question is probably "no", if by error you mean exception, and here is why.

Every exception-throwing function has spawned a predicate that is used to avoid the exception (because exceptions cancel the currently active operation and this is undesirable in some cases).

Separate predicates don't really help you avoid exceptions reliably in filesystems because of race conditions. But y'all know that. Why have we gone down that road, as opposed to having an argument that can be used to say "don't throw?" -- Dave Abrahams Boost Consulting www.boost-consulting.com

Beman Dawes

2:20 a.m.

At 08:51 PM 3/31/2005, David Abrahams wrote:

...

"Peter Dimov" <pdimov@mmltd.net> writes:

...
Beman Dawes wrote:

...
At 01:06 PM 3/31/2005, Rob Stewart wrote:

...
...
I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

But if it isn't possible to reliably answer that query, shouldn't it be an error?

The answer to this question is probably "no", if by error you mean exception, and here is why.

Every exception-throwing function has spawned a predicate that is used to

...
avoid the exception (because exceptions cancel the currently active operation and this is undesirable in some cases).

Separate predicates don't really help you avoid exceptions reliably in filesystems because of race conditions. But y'all know that. Why have we gone down that road...

One way of looking at the current discussion is that we are trying to avoid going down that road.

...

... as opposed to having an argument that can be used to say "don't throw?"

The other semantics of throwing and non-throwing functions may be different. The importance of avoiding race conditions may be different. That's why I'm thinking of separate functions distinguished by function name rather a "don't throw" argument. However, a stat()-like function might have both throwing and non-throwing versions, with no other differences, so that might be a candidate for a throw/don't-throw argument. --Beman

Beman Dawes

2:03 a.m.

At 07:17 PM 3/31/2005, Peter Dimov wrote:

...

Beman Dawes wrote:

...
At 01:06 PM 3/31/2005, Rob Stewart wrote:

...
...
I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

But if it isn't possible to reliably answer that query, shouldn't it be an error?

The answer to this question is probably "no", if by error you mean exception, and here is why.

Every exception-throwing function has spawned a predicate that is used to

...

avoid the exception (because exceptions cancel the currently active operation and this is undesirable in some cases).

If you make 'exists' throw an exception, you will create a demand for a 'preexists' query that will be used to detect whether the actual 'exists'

...

will throw.

I agree, but don't think that has to lead to a "no" answer. One plan would be to supply a status query function, perhaps named status(), which would never throw and which would return a bitmask which could be queried as to the exact attributes that apply. Including an attribute that indicates one of these oddball errors has occurred. As with POSIX, there would presumably be an lstat()-like query function that did not resolve symbolic links. Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions. --Beman

Rob Stewart

3:13 p.m.

From: Beman Dawes <bdawes@acm.org>

...

At 07:17 PM 3/31/2005, Peter Dimov wrote:

...
If you make 'exists' throw an exception, you will create a demand for a 'preexists' query that will be used to detect whether the actual 'exists' will throw.

I agree, but don't think that has to lead to a "no" answer. One plan would be to supply a status query function, perhaps named status(), which would never throw and which would return a bitmask which could be queried as to the exact attributes that apply. Including an attribute that indicates one of these oddball errors has occurred. As with POSIX, there would presumably be an lstat()-like query function that did not resolve symbolic links.

I think a single, overloaded function can quite easily accomodate everything and still be easy to use: struct no_throw_t { }; static no_throw_t nothrow; struct resolve_symlinks_t { }; static resolve_symlinks_t resolve_symlinks; enum attributes { ... }; attributes stat(path); // !resolves, throws attributes stat(path, no_throw_t); // !resolves, !throws attributes stat(path, resolve_symlinks_t); // resolves, throws attributes stat(path, resolve_symlinks_t, no_throw_t); // resolves, !throws

...

Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions.

Sounds great to me! -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Peter Dimov

3:38 p.m.

Rob Stewart wrote:

...

From: Beman Dawes <bdawes@acm.org>

...
Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions.

Sounds great to me!

I find it very odd for exists/is_* predicates to throw exceptions... and even odder for this to be described as "safe to use casually" when casual use may lead to aborting an operation when this is not desirable and the exception does not imply failure.

Rob Stewart

4:19 p.m.

From: "Peter Dimov" <pdimov@mmltd.net>

...

Rob Stewart wrote:

...
From: Beman Dawes <bdawes@acm.org>

...
Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions.

Sounds great to me!

I find it very odd for exists/is_* predicates to throw exceptions... and even odder for this to be described as "safe to use casually" when casual use may lead to aborting an operation when this is not desirable and the exception does not imply failure.

Actually, I don't agree with the exceptions. I meant to agree with everything up to that point. Sorry for the confusion. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

David Abrahams

5:17 p.m.

"Peter Dimov" <pdimov@mmltd.net> writes:

...

Rob Stewart wrote:

...
From: Beman Dawes <bdawes@acm.org>

...
Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions.

Sounds great to me!

I find it very odd for exists/is_* predicates to throw exceptions... and even odder for this to be described as "safe to use casually" when casual use may lead to aborting an operation when this is not desirable and the exception does not imply failure.

I missed a lot of detail, but based on the little I know I have to agree with Peter. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Beman Dawes

4 Apr 4 Apr

1:59 a.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:01cf01c536d0$ea2cf700$6601a8c0@pdimov...

...

Rob Stewart wrote:

...
From: Beman Dawes <bdawes@acm.org>

...
Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions.

Sounds great to me!

I find it very odd for exists/is_* predicates to throw exceptions... and even odder for this to be described as "safe to use casually" when casual use may lead to aborting an operation when this is not desirable and the exception does not imply failure.

To me, an io error does imply some kind of failure, and a serious one at that. Remember that errors reported by the system API call (stat() or similar) have been analyzed, and those that clearly indicate existance or not are not treated as errors. What is left are error codes that represent either hard errors, or conditions which are ambiguous as to the true status. I've been burned badly in the past by trying to continue after an io error, so really prefer errors to be announced nosily and as soon as possible. I'm also inclined to think that reporting a directory as !is_directory() when there is a permissions clash, even though it appears on a "ls" or file system browser as a directory, is likely to raise eyebrows and generate endless mistaken bug reports. But it looks from other's responses that I'm out voted. And it isn't just people who have commented on this list. I tested the Python os.path library, and its isdir() function doesn't even throw if the entire file system goes offline. --Beman

David Abrahams

11:04 a.m.

"Beman Dawes" <bdawes@acm.org> writes:

...

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:01cf01c536d0$ea2cf700$6601a8c0@pdimov...

...
Rob Stewart wrote:

...
From: Beman Dawes <bdawes@acm.org>

...
Presumably then exists() and the is_x() functions would be specified in terms of status() and lstatus() results. So in effect exists() and the is_x() functions are conveniences, very useful in many but not all cases, and safe to use casually since obscure errors will cause exceptions.

Sounds great to me!

I find it very odd for exists/is_* predicates to throw exceptions... and even odder for this to be described as "safe to use casually" when casual use may lead to aborting an operation when this is not desirable and the exception does not imply failure.

To me, an io error does imply some kind of failure, and a serious one at that. Remember that errors reported by the system API call (stat() or similar) have been analyzed, and those that clearly indicate existance or not are not treated as errors. What is left are error codes that represent either hard errors, or conditions which are ambiguous as to the true status.

I've been burned badly in the past by trying to continue after an io error, so really prefer errors to be announced nosily and as soon as possible. I'm also inclined to think that reporting a directory as !is_directory() when there is a permissions clash, even though it appears on a "ls" or file system browser as a directory, is likely to raise eyebrows and generate endless mistaken bug reports.

But it looks from other's responses that I'm out voted.

I wouldn't say that. My opinions were pretty underinformed as I tried to make clear. Your arguments above sound pretty convincing to me.

...

And it isn't just people who have commented on this list. I tested the Python os.path library, and its isdir() function doesn't even throw if the entire file system goes offline.

Rather than accept it at face value, I would check with the maintainers and/or comp.lang.python to see whether that's intentional or incidental behavior. I'm sure you'll find that the behavior in that case isn't well-specified. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Martin

5 Apr 5 Apr

6:47 a.m.

...

To me, an io error does imply some kind of failure, and a serious one at that. Remember that errors reported by the system API call (stat() or similar) have been analyzed, and those that clearly indicate existance or not are not treated as errors. What is left are error codes that represent either hard errors, or conditions which are ambiguous as to the true status.

I agree partly but there are 2 cases here. 1. The is_x are used as filter. Since there is no filtering directory_iterator available you need to use the is_x function to the filtering. In many cases you just want to iterate over existing accesible files and/or directories, follow or ignore links. In these cases a throw is an inconvenience since it complicates the code without any gain. 2. The is_x functions are used to verify user input/configuration or program logic follows different paths depending on result of is_x functions. In this context I fully agree that io errors should throw an exception. One solution is to keep the current throwing is_x function and add filtering capabilities to director_iterator (e.g. itr->is_directory() or itr = directory_iterator(path, files_only)). At least on Win32 this should be easy since the status is directly availble from the findfirst/findnext functions.

Beman Dawes

7 Apr 7 Apr

12:53 a.m.

"Martin" <adrianm@touchdown.se> wrote in message news:loom.20050405T083422-605@post.gmane.org...

...

...
To me, an io error does imply some kind of failure, and a serious one at that. Remember that errors reported by the system API call (stat() or similar) have been analyzed, and those that clearly indicate existance or not are not treated as errors. What is left are error codes that represent either hard errors, or conditions which are ambiguous as to the true status.

I agree partly but there are 2 cases here.

1. The is_x are used as filter. Since there is no filtering directory_iterator available you need to use the is_x function to the filtering. In many cases you just want to iterate over existing accesible files and/or directories, follow or ignore links. In these cases a throw is an inconvenience since it complicates the code without any gain.

I agree to the extent that some of the cases when the current implementation throws can be eliminated. For example, is_directory() can safely be changed to return false rather than throwing in the case of a non-existant path. (And we would add a similar is_path()). But I just can't convince myself that silently swallowing an I/O error would be safe And given the plan to add a status() function which can provide error swallowing behavior, a programmer can get that behavior if really, really, required.

...

2. The is_x functions are used to verify user input/configuration or program logic follows different paths depending on result of is_x functions. In this context I fully agree that io errors should throw an exception.

One solution is to keep the current throwing is_x function and add filtering capabilities to director_iterator (e.g. itr->is_directory() or itr = directory_iterator(path, files_only)). At least on Win32 this should be easy since the status is directly availble from the findfirst/findnext functions.

We really do need some form of directory iteration filtering. But I really want to finish the i18n changes (including those mentioned above) before working on that. Thanks, --Beman

Peter Dimov

8:26 a.m.

Beman Dawes wrote:

...

I agree to the extent that some of the cases when the current implementation throws can be eliminated. For example, is_directory() can safely be changed to return false rather than throwing in the case of a non-existant path. (And we would add a similar is_path()). But I just can't convince myself that silently swallowing an I/O error would be safe And given the plan to add a status() function which can provide error swallowing behavior, a programmer can get that behavior if really, really, required.

I find it much more natural for stat() to throw: attributes stat( path const & p ); because on error it can't return a valid value. If you fix the original defect that directory iteration only gives names without attributes, the use of is_directory in iteration loops will be eliminated. We'll then be left with its other uses, which may be easier to analyze.

Beman Dawes

10:23 p.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:003501c53b4b$8b6a1b60$6501a8c0@pdimov2...

...

Beman Dawes wrote:

...
I agree to the extent that some of the cases when the current implementation throws can be eliminated. For example, is_directory() can safely be changed to return false rather than throwing in the case of a non-existant path. (And we would add a similar is_path()). But I just can't convince myself that silently swallowing an I/O error would be safe And given the plan to add a status() function which can provide error swallowing behavior, a programmer can get that behavior if really, really, required.

I find it much more natural for stat() to throw:

Rob's suggestion was that the function have an argument enabling or disabling throwing.

...

attributes stat( path const & p );

because on error it can't return a valid value.

Sometimes some of the attributes are known, depending on the exact error. So if the attribute bits include both "exists" and "does not exist" bits, then in the face of an error it is still possible to know if one of those attributes is known. I'll try to get a spec together for comment, but it will be awhile.

...

If you fix the original defect that directory iteration only gives names without attributes, the use of is_directory in iteration loops will be eliminated. We'll then be left with its other uses, which may be easier to analyze.

How would you do that? Change the directory_iterator value_type to std::pair< path, attributes >, or leave directory_iterator alone but provide a stat() overload taking a directory_iterator (which presumably has the attributes cached on systems which provide attributes automatically during directory iteration)? --Beman

Peter Dimov

10:44 p.m.

Beman Dawes wrote:

...

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:003501c53b4b$8b6a1b60$6501a8c0@pdimov2...

...
If you fix the original defect that directory iteration only gives names without attributes, the use of is_directory in iteration loops will be eliminated. We'll then be left with its other uses, which may be easier to analyze.

How would you do that? Change the directory_iterator value_type to std::pair< path, attributes >, or leave directory_iterator alone but provide a stat() overload taking a directory_iterator (which presumably has the attributes cached on systems which provide attributes automatically during directory iteration)?

I've given it some thought, but I think that you are in a better position to choose between these alternatives than I am. Although I'd say that the first option should use a struct { name, attributes, [size...] } value type and not a pair with its nondescriptive first/second members and no room for future extensions. An is_directory( i ) overload is certainly less intrusive in that it allows us to keep the current iterator intact. But (a) is_directory( i ) and is_directory( *i ) are too close to each other and the users may mistakenly use the second form when they mean the first, and (b) I'm not sure whether we should strive to keep the current iterator. The reason I don't feel confident in choosing an alternative is that I don't know how important it is for the value_type of the iterator to be a path.

Rob Stewart

5 Apr 5 Apr

7:42 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

To me, an io error does imply some kind of failure, and a serious one at that. Remember that errors reported by the system API call (stat() or similar) have been analyzed, and those that clearly indicate existance or not are not treated as errors. What is left are error codes that represent either hard errors, or conditions which are ambiguous as to the true status.

That's not entirely unreasonable, but it's a question of use cases.

...

I've been burned badly in the past by trying to continue after an io error, so really prefer errors to be announced nosily and as soon as possible. I'm also inclined to think that reporting a directory as !is_directory() when there is a permissions clash, even though it appears on a "ls" or file system browser as a directory, is likely to raise eyebrows and generate endless mistaken bug reports.

Such eyebrow positions and bug reports would indicate a failure to understand permissioning. The key is to understand that is_directory() returns true iff the supplied pathname is a directory to the caller. That means it must exist, be accessible, and be a directory. Anything else isn't a directory, so far as the caller is concerned. Also, don't forget that with other, more detailed means to get information on a pathname, like the stat() (or whatever you choose to call it) function, one can get more precise information if warranted by the code. I recall that your original notion for the Filesystem library was to enable script-like coding in C++. Such code is less rigorous than normal applications. That's not to say that it need be sloppy, but it is often more forgiving. I'll again refer to my years of writing shell scripts on *nix. The -d test never generates a signal (the moral equivalent of an exception in C++ in this case); it just returns true or false. That simplicity makes it easier to write scripts, though I'll grant that it does mean you have to be a little careful to not make too many assumptions about what ! -d means.

...

But it looks from other's responses that I'm out voted. And it isn't just people who have commented on this list. I tested the Python os.path library, and its isdir() function doesn't even throw if the entire file system goes offline.

The number of votes is still pretty small, I think. I would obviously not complain if you agreed with me, but I'd feel more comfortable if there was more feedback. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

7 Apr 7 Apr

1:22 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200504051942.j35JgJR05410@vanzandt.balstatdev.susq.com...

...

From: "Beman Dawes" <bdawes@acm.org>

...
... The key is to understand that is_directory() returns true iff the supplied pathname is a directory to the caller. That means it must exist, be accessible, and be a directory. Anything else isn't a directory, so far as the caller is concerned.

If it is a CD-ROM, and a little slow to become ready (as happened to me several times in experimenting with how Python handles these cases), it really strikes me as stretching it to ignore the "Not ready" error. Likewise, when working across a network, and the network goes down, to we really want something that was a directory a second ago to become not a directory?

...

Also, don't forget that with other, more detailed means to get information on a pathname, like the stat() (or whatever you choose to call it) function, one can get more precise information if warranted by the code.

I'd rather require those who want to bypass error checks to use status().

...

I recall that your original notion for the Filesystem library was to enable script-like coding in C++. Such code is less rigorous than normal applications.

I don't buy that. I've seen really seriously flawed data get shipped to customers, doing untold harm to the business, because production scripts ignored errors.

...

That's not to say that it need be sloppy, but it is often more forgiving. I'll again refer to my years of writing shell scripts on *nix. The -d test never generates a signal (the moral equivalent of an exception in C++ in this case); it just returns true or false. That simplicity makes it easier to write scripts, though I'll grant that it does mean you have to be a little careful to not make too many assumptions about what ! -d means.

That's fine, but the default should be to not ignore errors, rather than the other way around, IMO.

...

...
But it looks from other's responses that I'm out voted. And it isn't just people who have commented on this list. I tested the Python os.path library, and its isdir() function doesn't even throw if the entire file system goes offline.

The number of votes is still pretty small, I think. I would obviously not complain if you agreed with me, but I'd feel more comfortable if there was more feedback.

Following Dave's suggestion, I've posted a query on comp.lang.python to see if I can find the rationale for their treatment of errors on queries. Also, my guess is that it is a little hard for most people to follow the several aspects of this discussion that we think should result in changes to the current library. Thus I may implement status(), recast exists() and the is_x() functions in terms of status(), and try to come up with a firmer (and testable) definition of what the status attributes mean (to answer Peter's very valid concerns about exact meaning.) That will give people something firmer to comment on. But due to other commitments, it may be three or four weeks before I can do that. Thanks, --Beman

Peter Dimov

11:04 a.m.

Beman Dawes wrote:

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200504051942.j35JgJR05410@vanzandt.balstatdev.susq.com...

...
I recall that your original notion for the Filesystem library was to enable script-like coding in C++. Such code is less rigorous than normal applications.

I don't buy that. I've seen really seriously flawed data get shipped to customers, doing untold harm to the business, because production scripts ignored errors.

"Ignore errors" is a broad brush that doesn't apply here. bool is_directory( p ) throw(); // returns: true if p is a directory, false otherwise or on error does not ignore errors. If you try to come up with an example that does untold harm based on the above is_directory, it won't be easy. That's because every void do_something_with( p ); will throw if p is mis-classified (and that's exactly as it should be). That aside, just to show that general principles don't always apply, here's an example where _not_ ignoring an I/O error does harm to the customer: open file f read contents in buffer close file f // #1 open file g write buffer into g close file g If #1 throws, you've just denied the customer access to the data in f, even though the data has just been read and may not be recoverable from this point onwards if the storage has failed physically at #1.

Beman Dawes

10:54 p.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:007d01c53b61$9a522b70$6601a8c0@pdimov...

...

Beman Dawes wrote:

...

...
I don't buy that. I've seen really seriously flawed data get shipped to customers, doing untold harm to the business, because production scripts ignored errors.

"Ignore errors" is a broad brush that doesn't apply here.

bool is_directory( p ) throw();

// returns: true if p is a directory, false otherwise or on error

does not ignore errors.

If you try to come up with an example that does untold harm based on the above is_directory, it won't be easy. That's because every

void do_something_with( p );

will throw if p is mis-classified (and that's exactly as it should be).

You are assuming that after finding is_directory( p ) is false, the application then calls do_something_with( p ). But lots of applications don't work that way. If the condition isn't met, they go to some fall-back strategy which doesn't involve p at all. There is code like that in the current regression reporting system. I also ran into a real-world problem two years ago, where swallowing an error in a geographic application caused a fall-back which resulted in people (me included) driving to the wrong location in the wrong part of town . An expensive waste of time. Although it was a bit funny; I ended up talking to a homeless man living in a broken down automobile. It wasn't until he mentioned that a whole series of cars had driven up that morning and stared at him that I realized what had happened.

...

That aside, just to show that general principles don't always apply, here's an example where _not_ ignoring an I/O error does harm to the customer:

open file f read contents in buffer close file f // #1

open file g write buffer into g close file g

If #1 throws, you've just denied the customer access to the data in f, even though the data has just been read and may not be recoverable from this point onwards if the storage has failed physically at #1.

But the programmer always had the option of either not enabling exceptions for error reporting (assuming the I/O was done with the <fstream>) or catching exceptions and dealing with them. --Beman

Rob Stewart

12 Apr 12 Apr

2:05 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200504051942.j35JgJR05410@vanzandt.balstatdev.susq.com...

...
From: "Beman Dawes" <bdawes@acm.org>

...
... The key is to understand that is_directory() returns true iff the supplied pathname is a directory to the caller. That means it must exist, be accessible, and be a directory. Anything else isn't a directory, so far as the caller is concerned.

If it is a CD-ROM, and a little slow to become ready (as happened to me several times in experimenting with how Python handles these cases), it really strikes me as stretching it to ignore the "Not ready" error. Likewise, when working across a network, and the network goes down, to we really want something that was a directory a second ago to become not a directory?

In the latter case, on NFS, a script is blocked waiting for the NFS server to come back. That doesn't help with other filesystems, of course, but, the idea is that a transient failure could be handled by the library. In the "Not ready" case, the library could retry some number of times or for up to some number of seconds (configurable values, of course). If the retry period is exceeded without resolving the problem, then just failing rather than throwing an exception still works. In the network down case, that's a hard error and, I suppose, merits an exception.

...

...
Also, don't forget that with other, more detailed means to get information on a pathname, like the stat() (or whatever you choose to call it) function, one can get more precise information if warranted by the code.

I'd rather require those who want to bypass error checks to use status().

Hmmm. That's a reasonable argument. Since there will be a means to get the information without an exception, that could work.

...

...
I recall that your original notion for the Filesystem library was to enable script-like coding in C++. Such code is less rigorous than normal applications.

I don't buy that. I've seen really seriously flawed data get shipped to customers, doing untold harm to the business, because production scripts ignored errors.

As Peter mentioned, returning false from is_xxx() is not ignoring an error. It is saying that at the moment, given the current permissions and the current user's credentials, the supplied path is no an xxx. Since the filesystem is fluid anyway -- a file/directory can be created, deleted, and modified at any time -- you really can't count on much even with exceptions. IOW, you have to write your code expecting that everything will work as desired and when it doesn't, write sufficiently smart error handling code or rely on the user to decipher the problem with the current state of the filesystem. For example, if I'm expecting to read a file and I cannot, I typically report that fact and leave to the user the job of figuring out what's wrong with the file or path to it. That approach is handled by a non-throwing is_xxx().

...

...
That's not to say that it need be sloppy, but it is often more forgiving. I'll again refer to my years of writing shell scripts on *nix. The -d test never generates a signal (the moral equivalent of an exception in C++ in this case); it just returns true or false. That simplicity makes it easier to write scripts, though I'll grant that it does mean you have to be a little careful to not make too many assumptions about what ! -d means.

That's fine, but the default should be to not ignore errors, rather than the other way around, IMO.

As we've said, this doesn't ignore errors, it just classifies them as cases that mean is_xxx() returns false.

...

Also, my guess is that it is a little hard for most people to follow the several aspects of this discussion that we think should result in changes to the current library. Thus I may implement status(), recast exists() and the is_x() functions in terms of status(), and try to come up with a firmer (and testable) definition of what the status attributes mean (to answer Peter's very valid concerns about exact meaning.)

That sounds like a good approach. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

David Abrahams

28 Apr 28 Apr

2:27 a.m.

Rob Stewart <stewart@sig.com> writes:

...

For example, if I'm expecting to read a file and I cannot, I typically report that fact and leave to the user the job of figuring out what's wrong with the file or path to it. That approach is handled by a non-throwing is_xxx().

...
...
That's not to say that it need be sloppy, but it is often more forgiving. I'll again refer to my years of writing shell scripts on *nix. The -d test never generates a signal (the moral equivalent of an exception in C++ in this case); it just returns true or false. That simplicity makes it easier to write scripts, though I'll grant that it does mean you have to be a little careful to not make too many assumptions about what ! -d means.

That's fine, but the default should be to not ignore errors, rather than the other way around, IMO.

As we've said, this doesn't ignore errors, it just classifies them as cases that mean is_xxx() returns false.

Right. Beware interface designs that avoidably classify inputs as abuses. Some people wanted to make a mathematical vector type for which you'd get an exception if you tried to assign a vector of one length into a vector of some other length. I argued with them. There was no reason not to simply resize the target vector. Similarly, there's no reason not to simply report false when is_xxx(p) doesn't find anything at path p. Actually, I'd almost be inclined to drop the is_xxx(p) function because the result becomes meaningless the moment you get it back: the xxx may or may not exist or be of the right type, no matter what the result is. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Beman Dawes

2 May 2 May

9:53 p.m.

...

Rob Stewart <stewart@sig.com> writes:

...
For example, if I'm expecting to read a file and I cannot, I typically report that fact and leave to the user the job of figuring out what's wrong with the file or path to it. That approach is handled by a non-throwing is_xxx().

...
...
That's not to say that it need be sloppy, but it is often more forgiving. I'll again refer to my years of writing shell scripts on *nix. The -d test never generates a signal (the moral equivalent of an exception in C++ in this case); it just returns true or false. That simplicity makes it easier to write scripts, though I'll grant that it does mean you have to be a little careful to not make too many assumptions about what ! -d means.

That's fine, but the default should be to not ignore errors, rather

At 10:27 PM 4/27/2005, David Abrahams wrote: than

...

the

...
...
other way around, IMO.

As we've said, this doesn't ignore errors, it just classifies them as cases that mean is_xxx() returns false.

Right. Beware interface designs that avoidably classify inputs as abuses. Some people wanted to make a mathematical vector type for which you'd get an exception if you tried to assign a vector of one length into a vector of some other length. I argued with them. There was no reason not to simply resize the target vector. Similarly, there's no reason not to simply report false when is_xxx(p) doesn't find anything at path p.

That's also the conclusion I came to in the analysis. "not found" isn't an error, it simply returns false. "I/O error", OTOH, is an error, and so the functions should throw an exception for this case.

...

Actually, I'd almost be inclined to drop the is_xxx(p) function because the result becomes meaningless the moment you get it back: the xxx may or may not exist or be of the right type, no matter what the result is.

In theory, that is certainly correct. But in practice a lot of filesystem work is done in environments where race conditions are not present, and these functions are very, very convenient. --Beman

Beman Dawes

4 Apr 4 Apr

2:11 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200504011513.j31FDSs10439@vanzandt.balstatdev.susq.com...

...

I think a single, overloaded function can quite easily accomodate everything and still be easy to use:

struct no_throw_t { }; static no_throw_t nothrow;

struct resolve_symlinks_t { }; static resolve_symlinks_t resolve_symlinks;

enum attributes { ... };

attributes stat(path); // !resolves, throws

attributes stat(path, no_throw_t); // !resolves, !throws

attributes stat(path, resolve_symlinks_t); // resolves, throws

attributes stat(path, resolve_symlinks_t, no_throw_t); // resolves, !throws

Yes, that should do the trick. Perhaps the default should be to resolve symlinks, since that is how the POSIX stat() works and is perhaps the more common need. But that isn't a strong argument, and in any case I think the function should be named status() or attributes(). --Beman

Rob Stewart

1 Apr 1 Apr

3:07 p.m.

From: Beman Dawes <bdawes@acm.org>

...

At 01:06 PM 3/31/2005, Rob Stewart wrote:

...
From: Beman Dawes <bdawes@acm.org>

...
But with the following errors I can't see any way to know if the path actually exists or not:

[EACCES] Search permission is denied for a component of the path

prefix.

Peter's query is key here: what does it mean for exist() to return true or false? You could say that returning false here is appropriate because, as far as the current user is concerned, the file doesn't exist.

I also think that Peter's query is key. To try to answer that, I looked at about 25 uses of exists() in .cpp files in the boost/tools hierarchy.

Perhaps a third of the uses would still make total sense if [EACCES] was treated as false. Perhaps another third of the uses (like using exists() to

Good.

...

see if create_directories() needs to be called) might technically be harmed returning false, but no harm would actually be done because some other call (create_directories() in the above case) would soon fail.

Right.

...

But perhaps a third of uses were simply to tell if some file system entry was present as part of determining control flow within the code. There wasn't an immediate use of the path after the exists() test, so it is very hard to tell what the effect of returning false would be. Even when you can see the immediate affect, it isn't always clear what is best. For example, link_check.cpp uses exist() tell if an HTML link is broken. Seems to me [EACCES] is an error condition - the link_check is being run with the wrong permissions or something.

I can tell you that bash, and other *nix shells which follow test(1)'s behavior, doesn't fail a script when you test with "-e pathname." It simply succeeds or fails. When you need to know more, you can use stat(1). I've never found that behavior problematic in any script I've written (and that's a lot). Based upon that experience, I think exists() should not throw exceptions. There are other tools to determine why. Thus, when it matters to the logic of the program why a file doesn't exist, you won't rely on the simply answer of exists().

...

...
...
[EIO] An error occurred while reading from the file system.

Is this a permanent failure?

There is no way to tell. POSIX doesn't give any indication. So it seems to me it should be treated as an error causing exception.

I think it should return false.

...

...
If so, it is reasonable to return false: at this time, the file doesn't exist, even if the intention is that it does. If it is a temporary failure, then a retry might succeed in finding the real answer. In that case, it is appropriate that the caller know that a retry is in order.

...
[ELOOP] A loop exists in symbolic links encountered during resolution of the path argument.

While that is an error condition, it doesn't prevent declaring that the file, as given by the supplied pathname, doesn't exist.

While it doesn't prevent us declaring it doesn't exist, something is clearly rotten so I'm much more comfortable throwing an exception (or doing something else that doesn't gloss over the fact that a possilbe error has occurred.)

The question asked of exists() is whether the supplied pathname exists. It doesn't, in this case. The logic error can be determined through other means. For example, when I write a script and -e doesn't find a file I'm expecting, the script fails. When I investigate why it failed, I track down the file it was looking for and figure out why the pathname is bad.

...

...
...
[ENAMETOOLONG] The length of the path argument exceeds {PATH_MAX} or a pathname component is longer than {NAME_MAX}.

The pathname clearly doesn't exist because it is invalid.

What if the pathname was too long because of a long prefix of "foo/../foo/../foo/../foo" eventually ending in a valid "/real-file"?

The *pathname* does not exist. You can't be prescient and figure out what the caller really meant.

...

...
Whether the file the caller is interested in actually exists is a separate question. You can only answer whether the supplied pathname refers to an existing file.

Like I said.

...

...
...
If you agree with that analysis, then it looks like exists() should be changed to throw on those last four errors. is_accessible() would have a

I disagree. All the caller wants to know is whether the supplied pathname refers to an existing file.

But if it isn't possible to reliably answer that query, shouldn't it be an error?

I think it is reliable and reasonable to return false in those error conditions.

...

OTOH, if there is a way (perhaps a status() function returning a bitmap as you suggested in an earlier post) to detect these probably rare error cases, then it would seem more acceptable for exists() to treat these errors as false. An application which had more stringent requirements can avoid exists() and use an explicit test of status() results.

Bingo! -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

7368

Age (days ago)

7401

Last active (days ago)

List overview

Download

38 comments

7 participants

participants (7)

Alf P. Steinbach
Beman Dawes
David Abrahams
Jeff Garland
Martin
Peter Dimov
Rob Stewart

[Filesystem] i18n

Martin

Beman Dawes

Martin

Jeff Garland

Peter Dimov

Rob Stewart

Beman Dawes

Peter Dimov

Beman Dawes

Peter Dimov

Beman Dawes

Peter Dimov

Alf P. Steinbach

Rob Stewart

Beman Dawes

Peter Dimov

David Abrahams

Beman Dawes

Beman Dawes

Rob Stewart

Peter Dimov

Rob Stewart

David Abrahams

Beman Dawes

David Abrahams

Martin

Beman Dawes

Peter Dimov

Beman Dawes

Peter Dimov

Rob Stewart

Beman Dawes

Peter Dimov

Beman Dawes

Rob Stewart

David Abrahams

Beman Dawes

Beman Dawes

Rob Stewart

tags

participants (7)