[filesystem] Operations predicates

older
Re: Boost filesystem documentation

Beman Dawes

2 May 2005 2 May '05

9:47 p.m.

I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions. See http://www.esva.net/~beman/filesystem_operations_predicates.htm Thanks to Peter Dimov, Rob Stewart, and Jeff Garland for their suggestions. Errors are mine alone. Comments? --Beman

Show replies by date

Peter Dimov

3 May 3 May

12:29 p.m.

Beman Dawes wrote:

...

I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

Thanks to Peter Dimov, Rob Stewart, and Jeff Garland for their suggestions. Errors are mine alone.

Comments?

One quick comment about status(). I don't like the fact that the user effectively has to read a global variable to obtain the error code. I'd prefer something along the lines of status_flag status( path const & p, int * error = 0 ); or packing the error code in the return value. I also don't like the fact that the error codes aren't the standard errno E* constants, but we already had this debate once. :-) (As with threading, it is my opinion that Posix should be acknowledged.)

Rob Stewart

3:04 p.m.

From: "Peter Dimov" <pdimov@mmltd.net>

...

Beman Dawes wrote:

...
I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

One quick comment about status(). I don't like the fact that the user effectively has to read a global variable to obtain the error code. I'd prefer something along the lines of

status_flag status( path const & p, int * error = 0 );

or packing the error code in the return value.

I also thought about that. system_error_code() can certainly be made threadsafe, and all commands can set a thread-specific error code. But it does mean that it is easy to ignore problems in calling status() and that one must read the error code before calling it again. Perhaps the flags could be accumulated with the error code in a class that has a safe-bool conversion? -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

4 May 4 May

12:50 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200505031504.j43F48eL022587@weezy.balstatdev.susq.com...

...

From: "Peter Dimov" <pdimov@mmltd.net>

...
Beman Dawes wrote:

...
I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

One quick comment about status(). I don't like the fact that the user effectively has to read a global variable to obtain the error code. I'd prefer something along the lines of

status_flag status( path const & p, int * error = 0 );

or packing the error code in the return value.

I also thought about that. system_error_code() can certainly be made threadsafe, and all commands can set a thread-specific error code.

The POSIX standard requires errno be thread safe: "For each thread of a process, the value of errno shall not be affected by function calls or assignments to errno by other threads." Ditto, Windows' ::GetLastError() function.

...

But it does mean that it is easy to ignore problems in calling status() ...

Yes, but otherwise identical functionality is available using the is_xxx() and exists() functions, which throw on errors. They would be preferred unless the program specifically needs to treat errors as non-throwing.

...

and that one must read the error code before calling it again.

Yes.

...

Perhaps the flags could be accumulated with the error code in a class that has a safe-bool conversion?

That's a possibility, although I think I like Peter's suggestion better just because it is a bit simpler. --Beman

Rob Stewart

2:25 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200505031504.j43F48eL022587@weezy.balstatdev.susq.com...

...
From: "Peter Dimov" <pdimov@mmltd.net>

...
Beman Dawes wrote:

...
But it does mean that it is easy to ignore problems in calling status() ...

Yes, but otherwise identical functionality is available using the is_xxx() and exists() functions, which throw on errors. They would be preferred unless the program specifically needs to treat errors as non-throwing.

Yes, folks will choose to use status() when they want nonthrowing behavior, but why design an interface that makes it easy to forget about the error code?

...

...
and that one must read the error code before calling it again.

Yes.

...
Perhaps the flags could be accumulated with the error code in a class that has a safe-bool conversion?

That's a possibility, although I think I like Peter's suggestion better just because it is a bit simpler.

I assume you're referring to this: status_flag status(path const &, int * error = 0); It is simpler from an implementation standpoint, but I don't think it is simpler from the caller's perspective. Since the argument is defaulted, the compiler will offer no help to ensure that you get and inspect the error number. Returning a type with the flags and error number together means that you always get the error number and can inspect it. Whether you use any schemes to ensure that the caller inspects a non-zero error number -- such as asserting in the destructor if an "inspected" flag isn't set -- is another matter. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

5 May 5 May

1:16 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200505041425.j44EPnrb027885@weezy.balstatdev.susq.com...

...

status_flag status(path const &, int * error = 0);

It is simpler from an implementation standpoint, but I don't think it is simpler from the caller's perspective. Since the argument is defaulted, the compiler will offer no help to ensure that you get and inspect the error number.

Returning a type with the flags and error number together means that you always get the error number and can inspect it. Whether you use any schemes to ensure that the caller inspects a non-zero error number -- such as asserting in the destructor if an "inspected" flag isn't set -- is another matter.

That seems like a convincing argument. We could use std::pair<> or tuple<>, but I'm included to put the two in a struct with meaningful member names. --Beman

Peter Dimov

9:39 a.m.

Beman Dawes wrote:

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200505041425.j44EPnrb027885@weezy.balstatdev.susq.com...

...
status_flag status(path const &, int * error = 0);

It is simpler from an implementation standpoint, but I don't think it is simpler from the caller's perspective. Since the argument is defaulted, the compiler will offer no help to ensure that you get and inspect the error number.

Returning a type with the flags and error number together means that you always get the error number and can inspect it. Whether you use any schemes to ensure that the caller inspects a non-zero error number -- such as asserting in the destructor if an "inspected" flag isn't set -- is another matter.

That seems like a convincing argument.

Typical uses: if( status( p ) & file_flag ) // ... if( status( p, &e ) & file_flag ) // ... vs if( status( p ).flags & file_flag ) // ... status_result r = status( p ); if( r.flags & file_flag ) // ... I don't see much of a difference here.

Rob Stewart

8:39 p.m.

From: "Peter Dimov" <pdimov@mmltd.net>

...

Beman Dawes wrote:

...
"Rob Stewart" <stewart@sig.com> wrote in message news:200505041425.j44EPnrb027885@weezy.balstatdev.susq.com...

...
status_flag status(path const &, int * error = 0);

It is simpler from an implementation standpoint, but I don't think it is simpler from the caller's perspective. Since the argument is defaulted, the compiler will offer no help to ensure that you get and inspect the error number.

Returning a type with the flags and error number together means that you always get the error number and can inspect it. Whether you use any schemes to ensure that the caller inspects a non-zero error number -- such as asserting in the destructor if an "inspected" flag isn't set -- is another matter.

That seems like a convincing argument.

Typical uses:

if( status( p ) & file_flag ) // ... if( status( p, &e ) & file_flag ) // ...

vs

if( status( p ).flags & file_flag ) // ...

status_result r = status( p ); if( r.flags & file_flag ) // ...

I don't see much of a difference here.

I'm assuming an operator & for the UDT status_result: if (status(p) & file_flag) ... vs if (status(p) & file_flag) ... If you actually use the error code: status_error e; status_result r(status(p, &e)); if (!e && (r & file_flag)) ... vs status_result r(status(p)); if (r && (r & file_flag)) ... Another advantage to returning a UDT is that the bitwise OR and (in)equality operators can throw an exception on error, effectively forcing the above usage to avoid an exception. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Peter Dimov

9:07 p.m.

Rob Stewart wrote:

...

If you actually use the error code:

status_error e; status_result r(status(p, &e)); if (!e && (r & file_flag)) ...

No, what I wrote above:

...

...
if( status( p, &e ) & file_flag ) // ...

is correct. file_flag is never set on error.

Rob Stewart

9:46 p.m.

From: "Peter Dimov" <pdimov@mmltd.net>

...

Rob Stewart wrote:

...
If you actually use the error code:

status_error e; status_result r(status(p, &e)); if (!e && (r & file_flag)) ...

No, what I wrote above:

...
...
if( status( p, &e ) & file_flag ) // ...

is correct. file_flag is never set on error.

Quite right. Sorry. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

4 May 4 May

12:30 a.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000e01c54fdb$be752eb0$6401a8c0@pdimov2...

...

Beman Dawes wrote:

...
I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

Thanks to Peter Dimov, Rob Stewart, and Jeff Garland for their suggestions. Errors are mine alone.

Comments?

One quick comment about status(). I don't like the fact that the user effectively has to read a global variable to obtain the error code. I'd prefer something along the lines of

status_flag status( path const & p, int * error = 0 );

or packing the error code in the return value.

I share your concern. Let me think about it a bit more.

...

I also don't like the fact that the error codes aren't the standard errno E* constants, but we already had this debate once. :-) (As with threading, it is my opinion that Posix should be acknowledged.)

The intent _is_ to supply the actual system error code (errno for POSIX). There is another function available to convert to a portable code if desired. Thanks, --Beman

Peter Dimov

10:19 a.m.

Beman Dawes wrote:

...

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000e01c54fdb$be752eb0$6401a8c0@pdimov2...

...
I also don't like the fact that the error codes aren't the standard errno E* constants, but we already had this debate once. :-) (As with threading, it is my opinion that Posix should be acknowledged.)

The intent _is_ to supply the actual system error code (errno for POSIX). There is another function available to convert to a portable code if desired.

My point was that errno should be the portable error code. This will save you one mapping. Your arguments for not using E* were purely implementation-driven and at this stage of the library development it makes sense to revisit the issue from standardization point of view.

Iain K. Hanson

2:56 p.m.

On Wed, 2005-05-04 at 13:19 +0300, Peter Dimov wrote:

...

Beman Dawes wrote:

...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000e01c54fdb$be752eb0$6401a8c0@pdimov2...

...
I also don't like the fact that the error codes aren't the standard errno E* constants, but we already had this debate once. :-) (As with threading, it is my opinion that Posix should be acknowledged.)

The intent _is_ to supply the actual system error code (errno for POSIX). There is another function available to convert to a portable code if desired.

My point was that errno should be the portable error code. This will save you one mapping. Your arguments for not using E* were purely implementation-driven and at this stage of the library development it makes sense to revisit the issue from standardization point of view.

I agree that it is time to re-visit these issues and they are going to apply to more than just the Filesystem library. I don't believe The E* names are portable to non-posix systems which includes ( at least part of ) Windows. I think that by default, C++ libraries should return C++ error names and not POSIX or any other platform or native error names. There should probably be a convention for mapping between exception names and error names. This will ease the programmers task as the move between platforms. just my 2p worth. /ikh _______________________________________________________________________ This email has been scanned for all known viruses by the MessageLabs Email Security System. _______________________________________________________________________

Peter Dimov

3:18 p.m.

Iain K. Hanson wrote:

...

I agree that it is time to re-visit these issues and they are going to apply to more than just the Filesystem library. I don't believe The E* names are portable to non-posix systems which includes ( at least part of ) Windows.

Strictly speaking, E* isn't directly portable to anything. Every system provides a mapping from its native error codes to E*. This is exactly what the filesystem library does at the moment, except that it maps to its own set. (The GetLastError to E* mapping that the MSVC runtime does is in crt/src/dosmap.c, but it looks like it can be improved.) I don't see a reason for us to reinvent the wheel yet again when the work of coming up with a portable set of error codes is already done (by a standard committee, no less).

Iain K. Hanson

4:19 p.m.

On Wed, 2005-05-04 at 18:18 +0300, Peter Dimov wrote:

...

Iain K. Hanson wrote:

...
I agree that it is time to re-visit these issues and they are going to apply to more than just the Filesystem library. I don't believe The E* names are portable to non-posix systems which includes ( at least part of ) Windows.

Strictly speaking, E* isn't directly portable to anything. Every system provides a mapping from its native error codes to E*. This is exactly what the filesystem library does at the moment, except that it maps to its own set.

(The GetLastError to E* mapping that the MSVC runtime does is in crt/src/dosmap.c, but it looks like it can be improved.)

Does that work in the Win SDK and not just with console apps with the POSIX sub-system? Also Windows is not the only Non- POSIX O/S. What about VMS, MVS , VM/CMS, and DOS/VSE ( if it is still around ).

...

I don't see a reason for us to reinvent the wheel yet again when the work of coming up with a portable set of error codes is already done (by a standard committee, no less).

:-) not what I was after, at least not for the sake of it. However, I do find some / many of the E* names a little cryptic. /ikh _______________________________________________________________________ This email has been scanned for all known viruses by the MessageLabs Email Security System. _______________________________________________________________________

Iain K. Hanson

5 May 5 May

12:28 a.m.

On Wed, May 04, 2005 at 06:18:46PM +0300, Peter Dimov wrote:

...

Iain K. Hanson wrote:

...
I agree that it is time to re-visit these issues and they are going to apply to more than just the Filesystem library. I don't believe The E* names are portable to non-posix systems which includes ( at least part of ) Windows.

Strictly speaking, E* isn't directly portable to anything. Every system provides a mapping from its native error codes to E*. This is exactly what the filesystem library does at the moment, except that it maps to its own set.

(The GetLastError to E* mapping that the MSVC runtime does is in crt/src/dosmap.c, but it looks like it can be improved.)

I don't see a reason for us to reinvent the wheel yet again when the work of coming up with a portable set of error codes is already done (by a standard committee, no less).

I missed this the 1st time round but POSIX does not define a set of error codes but just a set of names. That are not portable because not everyone has bought into POSIX. I'd rather have a consistant set of *readable* C++ names forn all C++ platforms defined by the C++ standards committee that could be mapped to POSIX of Native error names. /ikh

Beman Dawes

6 May 6 May

12:12 p.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00b901c55092$d30fc300$6401a8c0@pdimov2...

...

Beman Dawes wrote:

...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000e01c54fdb$be752eb0$6401a8c0@pdimov2...

...
I also don't like the fact that the error codes aren't the standard errno E* constants, but we already had this debate once. :-) (As with threading, it is my opinion that Posix should be acknowledged.)

The intent _is_ to supply the actual system error code (errno for POSIX). There is another function available to convert to a portable code if desired.

My point was that errno should be the portable error code. This will save you one mapping.

On a POSIX system that would work well. But what happens on a non-POSIX system? The E* macros won't be defined in <cerrno>. How will the user get access to them? Are you suggesting they be defined in one of the filesystem headers? I don't see how that would work. --Beman

Peter Dimov

12:46 p.m.

Beman Dawes wrote:

...

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00b901c55092$d30fc300$6401a8c0@pdimov2...

...
Beman Dawes wrote:

...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000e01c54fdb$be752eb0$6401a8c0@pdimov2...

...
I also don't like the fact that the error codes aren't the standard errno E* constants, but we already had this debate once. :-) (As with threading, it is my opinion that Posix should be acknowledged.)

The intent _is_ to supply the actual system error code (errno for POSIX). There is another function available to convert to a portable code if desired.

My point was that errno should be the portable error code. This will save you one mapping.

On a POSIX system that would work well. But what happens on a non-POSIX system? The E* macros won't be defined in <cerrno>. How will the user get access to them? Are you suggesting they be defined in one of the filesystem headers? I don't see how that would work.

I am suggesting that they shall be defined in <cerrno>. If this doesn't seem acceptable to you, I am not opposed to the filesystem library defining its own aliases of all applicable E* constants. If this is the case, I am suggesting that there shall be an 1:1 mapping of aliases to E* names and that the aliases shall have the same value as the corresponding E* names.

Jonathan Wakely

1:08 p.m.

On Fri, May 06, 2005 at 03:46:44PM +0300, Peter Dimov wrote:

...

Beman Dawes wrote:

...
On a POSIX system that would work well. But what happens on a non-POSIX system? The E* macros won't be defined in <cerrno>. How will the user get access to them? Are you suggesting they be defined in one of the filesystem headers? I don't see how that would work.

I am suggesting that they shall be defined in <cerrno>.

If this doesn't seem acceptable to you, I am not opposed to the filesystem library defining its own aliases of all applicable E* constants. If this is the case, I am suggesting that there shall be an 1:1 mapping of aliases to E* names and that the aliases shall have the same value as the corresponding E* names.

That's what I'd assumed you meant, so that if you're on a POSIX platform and are familiar with the E* symbolic names you can compare the Boost.Filesystem error code to ENOENT etc. If you don't have the E* names defined, or don't want to use them, you can use the corresponding Boost.Filesystem constant. Either way, the value's the same, so it's purely a matter of style. This would be consistent with the equality of std::char_traits<char>::eof() and EOF, and between std::numeric_limits<int>::max() and INT_MAX. jon

Rob Stewart

3 May 3 May

3 p.m.

From: Beman Dawes <bdawes@acm.org>

...

I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

I like that you specify the behavior of the is_xxx functions in terms of status(). That reduces the complexity of specifying the is_xxx functions. There's no mention of what exception is thrown for each of the is_xxx functions. I don't like the status()/symlink_status() split. How about overloading like this: struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink; status_flags status(path const &); status_flags status(path const &, follow_symlink_t); Client code will then always call status(), but will sometimes add follow_symlink. I think it reads better. (An OS that doesn't support symlinks can simply make the latter call the former.) The Effects sections are a little awkward to read because of all of the "Otherwise, if"s. Perhaps something like this would work: Effects: Queries the operating system to determine the attributes of path p. If p is a symbolic link, the link is resolved (that is, deep semantics). If the attributes indicate that p is a directory (see expectations), it returns directory_flag. If the attributes indicate that p is a file, it returns file_flag. If the query results in an error indicating that p could not be found footnote, it returns not_found_flag. If the query results in any other error, it returns error_flag. What about the other attributes that one can get from many (most?) OSes? Shouldn't status() report on read only, for example? The good news is that the status() I/F is extensible. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Peter Dimov

5:46 p.m.

Rob Stewart wrote:

...

I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

FWIW, I'm in favor of the current design.

Beman Dawes

4 May 4 May

1:04 a.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:005701c55008$0eacc010$6401a8c0@pdimov2...

...

Rob Stewart wrote:

...
I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

FWIW, I'm in favor of the current design.

Rationale? --Beman

Peter Dimov

10:16 a.m.

Beman Dawes wrote:

...

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:005701c55008$0eacc010$6401a8c0@pdimov2...

...
Rob Stewart wrote:

...
I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

FWIW, I'm in favor of the current design.

Rationale?

I don't view overloading for overloading's sake as improvement; there's nothing wrong with giving different names to different behaviors. The practice of overloading on behavior seems inspired by new(nothrow), but in that case we simply don't have the option of providing a differently-named function. One practical argument for not introducing overloading is that it's harder to use boost::bind on an overloaded function.

Rob Stewart

2:38 p.m.

From: "Peter Dimov" <pdimov@mmltd.net>

...

Beman Dawes wrote:

...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:005701c55008$0eacc010$6401a8c0@pdimov2...

...
Rob Stewart wrote:

...
I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

FWIW, I'm in favor of the current design.

Rationale?

I don't view overloading for overloading's sake as improvement; there's nothing wrong with giving different names to different behaviors.

These functions do very nearly the same thing. That seems an ideal case for overloading.

...

The practice of overloading on behavior seems inspired by new(nothrow), but in that case we simply don't have the option of providing a differently-named function.

You say that as if new(nothrow) was a poor source of inspiration. I thought it was just the thing.

...

One practical argument for not introducing overloading is that it's harder to use boost::bind on an overloaded function.

Is that a significant issue for this pair of functions? -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Peter Dimov

2:57 p.m.

Rob Stewart wrote:

...

From: "Peter Dimov" <pdimov@mmltd.net>

...
Beman Dawes wrote:

...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:005701c55008$0eacc010$6401a8c0@pdimov2...

...
Rob Stewart wrote:

...
I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

FWIW, I'm in favor of the current design.

Rationale?

I don't view overloading for overloading's sake as improvement; there's nothing wrong with giving different names to different behaviors.

These functions do very nearly the same thing. That seems an ideal case for overloading.

This particular case is on the fence. One could think of the overloaded interface as a typesafe approximation of status_return_flags status( path const & p, status_option_flags flags = status_option_default ); with follow_symlink being the only option flag at the moment. In general, though, excessive overloading should be avoided. My usual litmus test for appropriate overloading is: can you tell what this does: f( x, y ); without looking up the exact types of x and y.

Rob Stewart

4:42 p.m.

...

From: "Peter Dimov" <pdimov@mmltd.net> Rob Stewart wrote:

...
From: "Peter Dimov" <pdimov@mmltd.net>

...
I don't view overloading for overloading's sake as improvement; there's nothing wrong with giving different names to different behaviors.

These functions do very nearly the same thing. That seems an ideal case for overloading.

This particular case is on the fence. One could think of the overloaded interface as a typesafe approximation of

status_return_flags status( path const & p, status_option_flags flags = status_option_default );

with follow_symlink being the only option flag at the moment.

OK. I'm missing your point, though.

...

In general, though, excessive overloading should be avoided. My usual litmus

I agree wholeheartedly that "excessive overloading should be avoided." I don't see this as excessive overloading.

...

test for appropriate overloading is: can you tell what this does:

f( x, y );

without looking up the exact types of x and y.

You can never tell in C++. Is "f" a macro? What is the scope of the call? "f" is hardly a telling name, so how can one determine what f(x,y) does for any pair of arguments? All of those questions aside, how does this apply to the case before us? -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Peter Dimov

6:41 p.m.

Rob Stewart wrote:

...

...
From: "Peter Dimov" <pdimov@mmltd.net> Rob Stewart wrote:

...
From: "Peter Dimov" <pdimov@mmltd.net>

...
I don't view overloading for overloading's sake as improvement; there's nothing wrong with giving different names to different behaviors.

These functions do very nearly the same thing. That seems an ideal case for overloading.

This particular case is on the fence. One could think of the overloaded interface as a typesafe approximation of

status_return_flags status( path const & p, status_option_flags flags = status_option_default );

with follow_symlink being the only option flag at the moment.

OK. I'm missing your point, though.

My point was that altering behavior with response to a "flags" or "options" argument is an established idiom, and this case can be viewed as an application of that idiom, if one squints hard enough.

...

...
In general, though, excessive overloading should be avoided. My usual litmus

I agree wholeheartedly that "excessive overloading should be avoided." I don't see this as excessive overloading.

...
test for appropriate overloading is: can you tell what this does:

f( x, y );

without looking up the exact types of x and y.

You can never tell in C++. Is "f" a macro? What is the scope of the call? "f" is hardly a telling name, so how can one determine what f(x,y) does for any pair of arguments?

In the above, "f" is a placeholder for the actual function name.

...

All of those questions aside, how does this apply to the case before us?

It doesn't. It was an explanation of the general principles that guide me to look for compelling arguments in favor of overloading and reject it if there are none, rather than look for compelling arguments against overloading and reject it if there are any.

Beman Dawes

1:03 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200505031500.j43F05Z2022575@weezy.balstatdev.susq.com...

...

From: Beman Dawes <bdawes@acm.org>

...
I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

I like that you specify the behavior of the is_xxx functions in terms of status(). That reduces the complexity of specifying the is_xxx functions.

There's no mention of what exception is thrown for each of the is_xxx functions.

basic_filesystem_error<>.

...

I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

Client code will then always call status(), but will sometimes add follow_symlink. I think it reads better. (An OS that doesn't support symlinks can simply make the latter call the former.)

That seems more confusing than just having two separate functions.

...

The Effects sections are a little awkward to read because of all of the "Otherwise, if"s. Perhaps something like this would work:

Effects: Queries the operating system to determine the attributes of path p. If p is a symbolic link, the link is resolved (that is, deep semantics). If the attributes indicate that p is a directory (see expectations), it returns directory_flag. If the attributes indicate that p is a file, it returns file_flag. If the query results in an error indicating that p could not be found footnote, it returns not_found_flag. If the query results in any other error, it returns error_flag.

That seems clearer. Thanks!

...

What about the other attributes that one can get from many (most?) OSes? Shouldn't status() report on read only, for example? The good news is that the status() I/F is extensible.

For "read only" specifically, we had a long discussion in the early days of the library, and came to the conclusion that it just isn't portable. On POSIX, the concept of "read only" just isn't there in any usable form. In general, the thought was to add properties queries, which might or might not be available on any given operating system, and ould be queried for availability. I elected to not include them as part of Boost.Filesystem initially because I wanted to focus on core functionality. Nothing has been done about properties since then AFAIK. Thanks for the feedback, --Beman

Rob Stewart

2:36 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200505031500.j43F05Z2022575@weezy.balstatdev.susq.com...

...
From: Beman Dawes <bdawes@acm.org>

...
There's no mention of what exception is thrown for each of the is_xxx functions.

basic_filesystem_error<>.

OK, but I was suggesting that it would be helpful in the table.

...

...
I don't like the status()/symlink_status() split. How about overloading like this:

struct follow_symlink_t { }; extern const follow_symlink_t follow_symlink;

status_flags status(path const &); status_flags status(path const &, follow_symlink_t);

Client code will then always call status(), but will sometimes add follow_symlink. I think it reads better. (An OS that doesn't support symlinks can simply make the latter call the former.)

That seems more confusing than just having two separate functions.

I don't see how this if (symlink_flag == status(p) && directory_flag == status(p, follow_symlink)) is confusing, and this if (symlink_flag == symlink_status(p) && directory_flag == status(p)) is less so. With my suggested syntax, you're always asking a question about the supplied path if you don't add follow_symlink. The follow_symlink overload should, though I failed to mention it previously, ignore follow_symlink if the path is not that of a symlink. That does mean that getting the behavior of your status(p) means calling my status(p, follow_symlink) all of the time. Assuming that might be the normal desire -- a good assumption I think -- then I have the overloading wrong, and that may be what you thought was confusing. How about this instead: struct shallow_status_t { }; extern shallow_status_t shallow; status_flag status(path const &); // follows status_flag status(path const &, shallow); // doesn't follow (It occurred to me that including the term "symlink" is limiting since not all OSes have "symlinks" that have (at least partly) analogous concepts. That means naming the function "symlink_status" is similarly limiting.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Iain K. Hanson

4:35 p.m.

On Wed, 2005-05-04 at 10:36 -0400, Rob Stewart wrote:

...

From: "Beman Dawes" <bdawes@acm.org>

...

How about this instead:

struct shallow_status_t { }; extern shallow_status_t shallow;

status_flag status(path const &); // follows status_flag status(path const &, shallow); // doesn't follow

I marginally prefer the separately named functions because behavioural differences to me say separate functions names whereas overloads imply semanticly the same but with different types.

...

(It occurred to me that including the term "symlink" is limiting since not all OSes have "symlinks" that have (at least partly) analogous concepts. That means naming the function "symlink_status" is similarly limiting.)

I think symlink is a fine name and whether an overload or separate function names the function returning info on a symlink is going to be meaningless on a platform that does not have the concept. /ikh _______________________________________________________________________ This email has been scanned for all known viruses by the MessageLabs Email Security System. _______________________________________________________________________

Beman Dawes

5 May 5 May

1:31 a.m.

"Iain K. Hanson" <iain.hanson@videonetworks.com> wrote in message news:1115224522.14534.633.camel@dev-ihanson.ct.uk.videonetworks.com...

...

On Wed, 2005-05-04 at 10:36 -0400, Rob Stewart wrote:

...
From: "Beman Dawes" <bdawes@acm.org>

...
How about this instead:

struct shallow_status_t { }; extern shallow_status_t shallow;

status_flag status(path const &); // follows status_flag status(path const &, shallow); // doesn't follow

I marginally prefer the separately named functions because behavioural differences to me say separate functions names whereas overloads imply semanticly the same but with different types.

That rationale is convincing enough for me.

...

...
(It occurred to me that including the term "symlink" is limiting since not all OSes have "symlinks" that have (at least partly) analogous concepts. That means naming the function "symlink_status" is similarly limiting.)

I think symlink is a fine name and whether an overload or separate function names the function returning info on a symlink is going to be meaningless on a platform that does not have the concept.

I'd prefer to think use of symlink_status() on a platform without symbolic linas ensuring that the code will be portable to other operating systems, and to future versions of the current operating system. As an aside, I read recently a description of the next version of Windows, and it sounded somewhat like they might be adding symbolic links. Hard to say, however, as it was a marketing rather than technical description. --Beman

Rob Stewart

8:45 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

"Iain K. Hanson" <iain.hanson@videonetworks.com> wrote in message news:1115224522.14534.633.camel@dev-ihanson.ct.uk.videonetworks.com...

...
On Wed, 2005-05-04 at 10:36 -0400, Rob Stewart wrote:

...
From: "Beman Dawes" <bdawes@acm.org>

...
How about this instead:

struct shallow_status_t { }; extern shallow_status_t shallow;

status_flag status(path const &); // follows status_flag status(path const &, shallow); // doesn't follow

I marginally prefer the separately named functions because behavioural differences to me say separate functions names whereas overloads imply semanticly the same but with different types.

That rationale is convincing enough for me.

...
...
(It occurred to me that including the term "symlink" is limiting since not all OSes have "symlinks" that have (at least partly) analogous concepts. That means naming the function "symlink_status" is similarly limiting.)

I think symlink is a fine name and whether an overload or separate function names the function returning info on a symlink is going to be meaningless on a platform that does not have the concept.

I disagree. The normal function say to return information on the path or, if it is a symlink, on the object to which it refers. On a system without symlinks, there's no "or." If you avoid the word "symlink" in the function name, and follow the "shallow" scheme I suggested above, whether there is any sort of forwarding mechanism in the underlying OS (Windows has some bizarre, half-baked notions of that sort), then the shallow version can return information on that forwarding object, if possible. Thus, I'm advocating at least renaming symlink_status() to "shallow_status" to avoid the nomenclature problem. If the underlying OS has no forwarding mechanism, then a shallow status request is a normal status request.

...

I'd prefer to think use of symlink_status() on a platform without symbolic linas ensuring that the code will be portable to other operating systems, and to future versions of the current operating system.

You lost something in that paragraph. I'm not quite sure what you were trying to say.

...

As an aside, I read recently a description of the next version of Windows, and it sounded somewhat like they might be adding symbolic links. Hard to say, however, as it was a marketing rather than technical description.

No doubt it will be yet another addition to the strange beasts they already have, none of which will do what *nix has done for how long? -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Thomas Witt

3 May 3 May

5:40 p.m.

Beman, Beman Dawes wrote:

...

I've completed an analysis of expectations for the exists() and is_xxx() family of functions, and the previously suggested status() and symlink_status() functions.

See http://www.esva.net/~beman/filesystem_operations_predicates.htm

Thanks to Peter Dimov, Rob Stewart, and Jeff Garland for their suggestions. Errors are mine alone.

Comments?

Hmm, what's the benefit of having symlink_status() and status() ? Wouldn't one function suffice that sets the symlink flag when a symlink is encoutered on the way. I.e. symlink_flag | directory_flag identifies a symlink pointing to a directory symlink_flag | not_found_flag identifies a symlink pointing to a non existing file/dir and so on ... Thomas

...

--Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Thomas Witt witt@acm.org

Beman Dawes

4 May 4 May

12:25 a.m.

"Thomas Witt" <witt@acm.org> wrote in message news:d58cm5$mso$1@sea.gmane.org...

...

Hmm, what's the benefit of having symlink_status() and status() ? Wouldn't one function suffice that sets the symlink flag when a symlink is encoutered on the way. I.e.

symlink_flag | directory_flag

identifies a symlink pointing to a directory

symlink_flag | not_found_flag

identifies a symlink pointing to a non existing file/dir

and so on ...

I considered that briefly, but rejected it because status() on POSIX would then require two calls; one to stat() and one to lstat(). There is also a nice simplicity in the current design; the functions always returns a value with one and only one flag set. Neither of those are really killer arguments, so if others think it would be better to have a single status() function, I'd like to hear their arguments. Thanks, --Beman

Thomas Witt

12:46 a.m.

Beman, Beman Dawes wrote:

...

"Thomas Witt" <witt@acm.org> wrote in message news:d58cm5$mso$1@sea.gmane.org...

...
Hmm, what's the benefit of having symlink_status() and status() ? Wouldn't one function suffice that sets the symlink flag when a symlink is encoutered on the way. I.e.

I considered that briefly, but rejected it because status() on POSIX would then require two calls; one to stat() and one to lstat().

Hmmm .. mildly convincing.

...

There is also a nice simplicity in the current design; the functions always returns a value with one and only one flag set.

In this case the fact that it is a bitmask type seems to be kind of misleading. Isn't the whole point of a bitmask type to be able to have multiple flags set at once?

...

Neither of those are really killer arguments, so if others think it would be better to have a single status() function, I'd like to hear their arguments.

I am up in the air about it. On the one hand it seems wrong to me to have a suboptimal interface only because posix has one, on the other hand performance might be an issue. Thomas -- Thomas Witt witt@acm.org

Peter Dimov

10:20 a.m.

Thomas Witt wrote:

...

Beman,

Beman Dawes wrote:

...
Neither of those are really killer arguments, so if others think it would be better to have a single status() function, I'd like to hear their arguments.

I am up in the air about it. On the one hand it seems wrong to me to have a suboptimal interface only because posix has one, on the other hand performance might be an issue.

How does your suggestion handle the symlink to symlink case?

Thomas Witt

5 May 5 May

12:21 a.m.

Peter Dimov wrote:

...

How does your suggestion handle the symlink to symlink case?

Hmm, not at all I guess :-( Thomas -- Thomas Witt witt@acm.org

Rob Stewart

4 May 4 May

2:20 p.m.

From: Thomas Witt <witt@acm.org>

...

Beman Dawes wrote:

...
"Thomas Witt" <witt@acm.org> wrote in message news:d58cm5$mso$1@sea.gmane.org...

...
Hmm, what's the benefit of having symlink_status() and status() ? Wouldn't one function suffice that sets the symlink flag when a symlink is encoutered on the way. I.e.

I considered that briefly, but rejected it because status() on POSIX would then require two calls; one to stat() and one to lstat().

Hmmm .. mildly convincing.

I don't find it convincing. Should such an implementation detail drive what may well become a standardized interface? Besides, when an implementor provides this information, s/he probably has (or can create) useful low level, nonportable APIs available to make this efficient.

...

...
There is also a nice simplicity in the current design; the functions always returns a value with one and only one flag set.

In this case the fact that it is a bitmask type seems to be kind of misleading. Isn't the whole point of a bitmask type to be able to have multiple flags set at once?

I agree. What's the point of the bitmask? As described, masking is needed, but since only one value is ever actually returned, what's the point? Equality comparisons are easier than bitmasks.

...

...
Neither of those are really killer arguments, so if others think it would be better to have a single status() function, I'd like to hear their arguments.

I am up in the air about it. On the one hand it seems wrong to me to have a suboptimal interface only because posix has one, on the other hand performance might be an issue.

I don't think performance needs to be a factor. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Beman Dawes

5 May 5 May

1:11 a.m.

"Thomas Witt" <witt@acm.org> wrote in message news:d595jr$2d0$1@sea.gmane.org...

...

...
There is also a nice simplicity in the current design; the functions always returns a value with one and only one flag set.

In this case the fact that it is a bitmask type seems to be kind of misleading. Isn't the whole point of a bitmask type to be able to have multiple flags set at once?

Multiple flags are or'ed together for tests: if ( (status(p) & (directory_flag|file_flag)) != 0 ) ... Isn't the usual way of the standard is to describe that as a "bitmask type"? --Beman

Thomas Witt

2:29 p.m.

Beman, Beman Dawes wrote:

...

"Thomas Witt" <witt@acm.org> wrote in message news:d595jr$2d0$1@sea.gmane.org...

...
In this case the fact that it is a bitmask type seems to be kind of misleading. Isn't the whole point of a bitmask type to be able to have multiple flags set at once?

Multiple flags are or'ed together for tests:

if ( (status(p) & (directory_flag|file_flag)) != 0 ) ...

Isn't the usual way of the standard is to describe that as a "bitmask type"?

Sorry if my post sounded offending, it wasn't meant that way. That being said it was clueless anyway. I'll search for a stone to hide under. Thomas -- Thomas Witt witt@acm.org

Beman Dawes

6 May 6 May

12:26 p.m.

"Thomas Witt" <witt@acm.org> wrote in message news:d5da7v$d45$1@sea.gmane.org...

...

Beman,

Beman Dawes wrote:

...
"Thomas Witt" <witt@acm.org> wrote in message news:d595jr$2d0$1@sea.gmane.org...

...
In this case the fact that it is a bitmask type seems to be kind of misleading. Isn't the whole point of a bitmask type to be able to have multiple flags set at once?

Multiple flags are or'ed together for tests:

if ( (status(p) & (directory_flag|file_flag)) != 0 ) ...

Isn't the usual way of the standard is to describe that as a "bitmask type"?

Sorry if my post sounded offending, it wasn't meant that way.

It didn't sound offending at all, and that thought never occurred to me. I phrased my answer that way so I wouldn't look too stupid if you were seeing something that I was missing.

...

That being said it was clueless anyway. I'll search for a stone to hide under.

No way! Your comments are much appreciated. --Beman

Rob Stewart

5 May 5 May

8:47 p.m.

From: "Beman Dawes" <bdawes@acm.org>

...

"Thomas Witt" <witt@acm.org> wrote in message news:d595jr$2d0$1@sea.gmane.org...

...
...
There is also a nice simplicity in the current design; the functions always returns a value with one and only one flag set.

In this case the fact that it is a bitmask type seems to be kind of misleading. Isn't the whole point of a bitmask type to be able to have multiple flags set at once?

Multiple flags are or'ed together for tests:

if ( (status(p) & (directory_flag|file_flag)) != 0 ) ...

Isn't the usual way of the standard is to describe that as a "bitmask type"?

I think Thomas was looking at it from the other perspective, as was I: it only ever returns one value, so why is it a bitmask. Your example is compelling, however. BTW, if you return a UDT, you can provide both (in)equality and bitwise OR operators. The former would allow more straightforward comparisons when you are checking for just one flag. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

7389

Age (days ago)

7393

Last active (days ago)

List overview

Download

41 comments

7 participants

participants (7)

Beman Dawes
Iain K. Hanson
Iain K. Hanson
Jonathan Wakely
Peter Dimov
Rob Stewart
Thomas Witt