[Filesystem V3] Filesystem Version 3 beta 1 available for download and comment

The first beta release of Boost.Filesystem Version 3 is now available. The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm Important differences from the prior version: http://mysite.verizon.net/beman/v3/v3.html The beta can be downloaded from the Boost vault. See http://www.boostpro.com/vault/index.php?&direction=0&order=&directory=Filesystem-V3 Installing the beta is described at http://mysite.verizon.net/beman/v3/install_distro.html I'm concerned about breaking existing code. While some workarounds have been provided already, I'd open to further suggestions as to how to mitigate code breakage. The tentative plan for integrating V3 into the boost release cycle involves several steps: * For the first boost release cycle, ship V2 as the default, but V3 is present and can be activated by users. * For an additional boost release cycle, ship V3 as the default, but V2 is present (but deprecated) and can be activated by users. * For the following boost release cycle, V2 is removed and no longer supported. The plan is to go through as many beta cycles as needed to reach stability. Although quite a bit of testing has already been done, this is definitely beta software that isn't ready yet for production use. Any comments appreciated. --Beman

Beman Dawes wrote:
Important differences from the prior version: http://mysite.verizon.net/beman/v3/v3.html
I just looked through this page and have the following comments: "stem" means nothing to me. I followed the Reference link to the path class documentation and find no description of it there. complete(), even though the term is odd versus the normal "absolute" terminology, works well because "complete" can be a verb. OTOH, absolute() is bad because "absolute" cannot be a verb, so it makes a bad function name. IOW, it should be named "make_absolute." Similarly, unique_path() is poorly named because "unique path" is not a verbal. The function should be named "make_unique_path." Typo: "To ease the transition, Versions 2 and 3 will both included...." s/will both/will both be/ Why the ellipsis with the yellow background? ------- When looking at the path query functions to learn about "stem," I found the *painfully* highlighted "Deprecated convenience functions" section. I have comments on what's revealed therein: You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period. change_extension() requires that the client prepend a period to the new extension string. Why is that necessary? I can imagine several use cases in which the extension would be found without a leading period. The client code would then have to prepend a period to use change_extension(). Also, if you change extension() as I note above, then it will return the right value for change_extension()'s second argument. Rob Stewart Software Engineer, Core Software Susquehanna International Group, LLP robert.stewart@sig.com using std::disclaimer; http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
Beman Dawes wrote:
Important differences from the prior version: http://mysite.verizon.net/beman/v3/v3.html
I just looked through this page and have the following comments:
"stem" means nothing to me. I followed the Reference link to the path class documentation and find no description of it there.
Added a direct link to the reference doc description. Added an example to the reference doc description.
complete(), even though the term is odd versus the normal "absolute" terminology, works well because "complete" can be a verb. OTOH, absolute() is bad because "absolute" cannot be a verb, so it makes a bad function name. IOW, it should be named "make_absolute."
Interesting. I went back and forth between "absolute" and "make_absolute", but the verb argument didn't occur to me. I think that's enough to swing the decision to "make_absolute".
Similarly, unique_path() is poorly named because "unique path" is not a verbal. The function should be named "make_unique_path."
I also considered "make_unique_path", but worried that people might expect the actual path to be created in the external file system, rather than just returning a path object. Does that concern you, or do you still consider "make_unique_path" better?
Typo: "To ease the transition, Versions 2 and 3 will both included...."
s/will both/will both be/
Fixed.
Why the ellipsis with the yellow background?
I use a yellow background to for text that needs improvement. ... is just a place holder.
When looking at the path query functions to learn about "stem," I found the *painfully* highlighted "Deprecated convenience functions" section. I have comments on what's revealed therein:
You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period.
IIRC, that was Volodya's original design and I can't recall anyone ever complaining about it. True, we didn't have the has_extension() function, but still, I hate to break existing code. Does anyone else have a strong opinion?
change_extension() requires that the client prepend a period to the new extension string. Why is that necessary? I can imagine several use cases in which the extension would be found without a leading period. The client code would then have to prepend a period to use change_extension(). Also, if you change extension() as I note above, then it will return the right value for change_extension()'s second argument.
Again, I'm concerned about breaking existing code. I'm willing to do that for V3, but only if I become convinced it results in a real and noticeable improvement. Thanks for the comments! --Beman

Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period.
IIRC, that was Volodya's original design and I can't recall anyone ever complaining about it. True, we didn't have the has_extension() function, but still, I hate to break existing code. Does anyone else have a strong opinion?
It is the right design to retain the period, IMO, and most "get extension" functions do so, even on Windows, where there is no difference between "foo" and "foo." when actually used to refer to a file. See for example http://msdn.microsoft.com/en-us/library/e737s6tf%28VS.100%29.aspx On POSIX, it's even more important to retain the period, because "foo" and "foo." refer to different files. There are two main reasons to retain the period. First, client code may wish to distinguish between "" and "." as extensions, in order to, for example, append a default extension only in the "" case, where one is not supplied. Second, and this also applies to retaining the trailing slash in the directory name, to provide the ability to reconstruct the original path by concatenating its elements (directory, name, extension).
change_extension() requires that the client prepend a period to the new extension string. Why is that necessary? I can imagine several use cases in which the extension would be found without a leading period. The client code would then have to prepend a period to use change_extension(). Also, if you change extension() as I note above, then it will return the right value for change_extension()'s second argument.
Again, I'm concerned about breaking existing code. I'm willing to do that for V3, but only if I become convinced it results in a real and noticeable improvement.
This is also the correct design, for consistency with the above. Changing the extension to "" and "." are different operations. Again, existing practice requires the dot even on Windows, see for example http://delphi.about.com/library/rtl/blrtlChangeFileExt.htm

Peter Dimov wrote:
Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period.
IIRC, that was Volodya's original design and I can't recall anyone ever complaining about it. True, we didn't have the has_extension() function, but still, I hate to break existing code. Does anyone else have a strong opinion?
It is the right design to retain the period, IMO, and most "get extension" functions do so, even on Windows, where there is no difference between "foo" and "foo." when actually used to refer to a file. See for example
http://msdn.microsoft.com/en-us/library/e737s6tf%28VS.100%29.aspx
That's an interesting precedent, but that strikes me as wrong, too!
On POSIX, it's even more important to retain the period, because "foo" and "foo." refer to different files.
I can see that creating "foo." from "foo" requires that one be able to set the extension as "." and that would require special case code. Perhaps the right solution is to prefix the argument with "." when omitted? That way, existing code, which provides the "." will continue to work, while code that has the extension, but no period, can work henceforth. That works for setting an extension, such as with change_extension(), but doesn't address what extension() should return; see below.
There are two main reasons to retain the period. First, client code may wish to distinguish between "" and "." as extensions, in order to, for example, append a default extension only in the "" case, where one is not supplied.
That's easily done using has_extension() as I mentioned.
Second, and this also applies to retaining the trailing slash in the directory name, to provide the ability to reconstruct the original path by concatenating its elements (directory, name, extension).
With my "optional period" suggestion above, that wouldn't be a problem.
change_extension() requires that the client prepend a period to the new extension string. Why is that necessary? I can imagine several use cases in which the extension would be found without a leading period. The client code would then have to prepend a period to use change_extension(). Also, if you change extension() as I note above, then it will return the right value for change_extension()'s second argument.
Again, I'm concerned about breaking existing code. I'm willing to do that for V3, but only if I become convinced it results in a real and noticeable improvement.
This is also the correct design, for consistency with the above. Changing the extension to "" and "." are different operations. Again, existing practice requires the dot even on Windows, see for example
Another bad precedent! :-} My "optional period" suggestion works here, too. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Some comments below from a random C++ developer who has written multiple cross-platform filesystem libraries for US military use. Take them for whatever you want. Hopefully they're useful. :-) Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
"stem" means nothing to me. I followed the Reference link to the path class documentation and find no description of it there.
Added a direct link to the reference doc description. Added an example to the reference doc description.
Regrettably the terminology for this is remarkably nonstandard, but I've never heard of "stem" before and would not have had any idea what it returned. Typically I've seen this called "base_filename" or even just "filename". Where did "stem" come from? Is there a precedent I'm not aware of?
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost- bounces@lists.boost.org] On Behalf Of Stewart, Robert Sent: Thursday, February 18, 2010 2:27 PM To: boost@lists.boost.org Subject: Re: [boost] [Filesystem V3] Filesystem Version 3 beta 1 availablefor download and comment
Peter Dimov wrote:
Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period.
IIRC, that was Volodya's original design and I can't recall anyone ever complaining about it. True, we didn't have the has_extension() function, but still, I hate to break existing code. Does anyone else have a strong opinion?
It is the right design to retain the period, IMO, and most "get extension" functions do so, even on Windows, where there is no difference between "foo" and "foo." when actually used to refer to a file. See for example
http://msdn.microsoft.com/en-us/library/e737s6tf%28VS.100%29.aspx
That's an interesting precedent, but that strikes me as wrong, too!
On POSIX, it's even more important to retain the period, because "foo" and "foo." refer to different files.
I can see that creating "foo." from "foo" requires that one be able to set the extension as "." and that would require special case code. Perhaps the right solution is to prefix the argument with "." when omitted? That way, existing code, which provides the "." will continue to work, while code that has the extension, but no period, can work henceforth.
From what I've seen, either approach works and doesn't tend to imply significantly more work on the user's part - most use cases I've had are to maintain maps of extensions to some sort of class for processing files of that type, or to recognize certain types of files in a directory, which work either way. So my implementations tended to return with the leading dot for extension() (for disambiguating the crazy POSIX case) and to accept with or without leading dot for change_extension. If no leading dot is provided, one is automatically prepended. Giving the empty string to change_extension removed the extension, but I also had a special method for that. This worked well in practice for me. I do agree that "has_extension" is useful to have for clarity and for the rare use cases where only the presence of the extension matters and its contents do not.
Note that if there were ever a system that by convention used something other than dot for an extension separator, requiring a leading dot could be a problem. I'm not aware of any such systems though. One wrinkle that I never was able to decide how to handle was multiple extensions, like ".tar.gz". Some use cases would want ".tar.gz", some would just want ".gz", and a few would even want just ".tar". Does this library provide any direct support for managing chains of extensions like that? Hope that helped. Gregory Peele, Jr. Applied Research Associates, Inc.

On Thu, Feb 18, 2010 at 2:54 PM, Gregory Peele ARA/CFD <gpeele@ara.com> wrote:
Some comments below from a random C++ developer who has written multiple cross-platform filesystem libraries for US military use. Take them for whatever you want. Hopefully they're useful. :-)
Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
"stem" means nothing to me. I followed the Reference link to the path class documentation and find no description of it there.
Added a direct link to the reference doc description. Added an example to the reference doc description.
Regrettably the terminology for this is remarkably nonstandard, but I've never heard of "stem" before and would not have had any idea what it returned. Typically I've seen this called "base_filename" or even just "filename". Where did "stem" come from?
See http://article.gmane.org/gmane.comp.lib.boost.devel/177103
One wrinkle that I never was able to decide how to handle was multiple extensions, like ".tar.gz". Some use cases would want ".tar.gz", some would just want ".gz", and a few would even want just ".tar". Does this library provide any direct support for managing chains of extensions like that?
The design decision for stem() to return just the last extension rather than the entire chain allows a user to visit each element in the chain. For example, path p = "foo.bar.baz.tar.bz2"; for (; !p.extension().empty(); p = p.stem()) cout << p.extension() << '\n'; displays: .bz2 .tar .baz .bar and variations on that would allow fairly easy composition of any operation on the chain that you might like to perform. OTOH, if stem() returned the entire chain, composition of other operations wouldn't be possible. An extension chain iterator could be provided, but that seems overkill to me. Thanks, --Beman

Beman Dawes wrote:
On Thu, Feb 18, 2010 at 2:54 PM, Gregory Peele ARA/CFD <gpeele@ara.com> wrote:
Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
"stem" means nothing to me. I followed the Reference link to the path class documentation and find no description of it there.
Regrettably the terminology for this is remarkably nonstandard, but I've never heard of "stem" before and would not have had any idea what it returned. Typically I've seen this called "base_filename" or even just "filename". Where did "stem" come from?
See http://article.gmane.org/gmane.comp.lib.boost.devel/177103
I now understand the rationale for "stem," but I'm not a linguist, so I don't think of "stem" in that sense. (I think of plants, which is hardly helpful when mapping to a filesystem.) Here's the terminology I like: * pathname: a sufficiently qualified description of a file's location * path: the possibly empty directory hierarchy necessary to locate a file * basename: the last component of a pathname that fully names a file (from basename(1), of course) * filename: can be an alias for "basename" but is often used as an alias for what you call "stem" * extension: the part of the basename following the last period * suffix: the part of the basename following a first period The difference between "extension" and "suffix" warrant clarification. They are based upon experience with Windows and *nix filesystems. In the former, only the part following the last period actually determines the file type and is reasonably called the extension. The rest is ignored as being part of the filename. In the latter, everything added to the initial part of the basename modifies or extends that initial part, so there can be multiple suffixes (the ".tar.gz" example comes to mind). Whether the set or the parts are considered a "suffix" can be debated. Given the opportunity to clash with various personal preferences and preconceptions, some of those terms remain troublesome. What if you used "head," "tail," and "extension?" That is, basename() could return the last component of the pathname, which could be a type that has three accessors: head(), tail(), and extension(). tail() would return everything after the first period, while extension() would return what follows the last period. Such a class would likely need a conversion operator to the string type for ease of use and backward compatibility. You could instead have basename() return the string type, but add accessors: basename_head(), basename_tail(), and basename_extension(). That's uglier in my mind. A third option is to provide extractors: non-member functions that parse a string to extract the head, tail, or extension. (That last option is the most flexible as it even works on pathnames and other strings.) To provide a ready means to navigate multi-suffix sequences, you could extend tail() with suffixes(), which could return a collection or populate a collection via an output iterator. That is, the collection would then contain one string for every period-delimited string in the tail. BTW, I consider "filename," "basename," and "pathname" as single words, not compounds of "file," "base," "path," and "name." _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On 2/21/2010 8:00 AM, Beman Dawes wrote:
One wrinkle that I never was able to decide how to handle was multiple extensions, like ".tar.gz". Some use cases would want ".tar.gz", some would just want ".gz", and a few would even want just ".tar". Does this library provide any direct support for managing chains of extensions like that?
The design decision for stem() to return just the last extension rather than the entire chain allows a user to visit each element in the chain. For example,
path p = "foo.bar.baz.tar.bz2"; for (; !p.extension().empty(); p = p.stem()) cout<< p.extension()<< '\n';
displays:
.bz2 .tar .baz .bar
I can't find this anywhere in the documentation, and it's pretty important to note -- I wouldn't have considered that for-loop structure, probably resorting instead to finding the first period and splitting up the string. I don't think an iterator is necessary here, although it would have made the loop construction more obvious. I don't see a pressing need to pass extensions to functions that expect iterators. --Jeffrey Bosboom

On Tue, Feb 23, 2010 at 12:28 AM, Jeffrey Bosboom <jbosboom@uci.edu> wrote:
On 2/21/2010 8:00 AM, Beman Dawes wrote:
One wrinkle that I never was able to decide how to handle was multiple extensions, like ".tar.gz". Some use cases would want ".tar.gz", some would just want ".gz", and a few would even want just ".tar". Does this library provide any direct support for managing chains of extensions like that?
The design decision for stem() to return just the last extension rather than the entire chain allows a user to visit each element in the chain. For example,
path p = "foo.bar.baz.tar.bz2"; for (; !p.extension().empty(); p = p.stem()) cout<< p.extension()<< '\n';
displays:
.bz2 .tar .baz .bar
I can't find this anywhere in the documentation, and it's pretty important to note -- I wouldn't have considered that for-loop structure, probably resorting instead to finding the first period and splitting up the string.
I've added the above example to the reference docs.
I don't think an iterator is necessary here, although it would have made the loop construction more obvious. I don't see a pressing need to pass extensions to functions that expect iterators.
Agreed. Thanks! --Beman

On Thu, Feb 18, 2010 at 1:39 PM, Peter Dimov <pdimov@pdimov.com> wrote:
Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period.
IIRC, that was Volodya's original design and I can't recall anyone ever complaining about it. True, we didn't have the has_extension() function, but still, I hate to break existing code. Does anyone else have a strong opinion?
It is the right design to retain the period, IMO, and most "get extension" functions do so, even on Windows, where there is no difference between "foo" and "foo." when actually used to refer to a file. See for example
http://msdn.microsoft.com/en-us/library/e737s6tf%28VS.100%29.aspx
On POSIX, it's even more important to retain the period, because "foo" and "foo." refer to different files.
There are two main reasons to retain the period. First, client code may wish to distinguish between "" and "." as extensions, in order to, for example, append a default extension only in the "" case, where one is not supplied. Second, and this also applies to retaining the trailing slash in the directory name, to provide the ability to reconstruct the original path by concatenating its elements (directory, name, extension).
change_extension() requires that the client prepend a period to the new extension string. Why is that necessary? I can imagine several use cases in which the extension would be found without a leading period. The client code would then have to prepend a period to use change_extension(). Also, if you change extension() as I note above, then it will return the right value for change_extension()'s second argument.
Again, I'm concerned about breaking existing code. I'm willing to do that for V3, but only if I become convinced it results in a real and noticeable improvement.
This is also the correct design, for consistency with the above. Changing the extension to "" and "." are different operations. Again, existing practice requires the dot even on Windows, see for example
Convincing arguments. I've added a link to this message in the extension() reference docs. Thanks, --Beman

Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
"stem" means nothing to me. I followed the Reference link to the path class documentation and find no description of it there.
Added a direct link to the reference doc description. Added an example to the reference doc description.
I found it. It would be easier to understand stem() as returning p.filename() without p.extension(). Perhaps you could add that to the description ahead of the more technical description you've used.
Similarly, unique_path() is poorly named because "unique path" is not a verbal. The function should be named "make_unique_path."
I also considered "make_unique_path", but worried that people might expect the actual path to be created in the external file system, rather than just returning a path object. Does that concern you, or do you still consider "make_unique_path" better?
I understand your concern. However, path does not create any files or directories in a filesystem, so why would anyone suppose "make_unique_path" would do so? Also note that one normally speaks of "creating" a file, not "making" one. Rob Stewart Software Engineer, Core Software Susquehanna International Group, LLP robert.stewart@sig.com using std::disclaimer; http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Beman Dawes wrote:
On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
You state that extension() returns the period to allow distinguishing between an empty extension and no extension. That seems wrong. Typical use cases for working with the extension will require stripping the period before proceeding, so you push extra work onto the client. Furthermore, I can't think of a case in which extension processing code would work differently when there is no extension and when the extension is empty. The extension is an empty string in both cases. Since you already provide has_extension() for distinguishing that there is one, extension() should return an empty string when nothing follows the period.
IIRC, that was Volodya's original design and I can't recall anyone ever complaining about it. True, we didn't have the has_extension() function, but still, I hate to break existing code. Does anyone else have a strong opinion?
FWIW, the GNU Make function 'suffix' also sets a precedent of including the '.' http://uw714doc.sco.com/cgi-bin/info2html?%28make.info%29File%2520Name%2520Functions&lang=en On the rare occasions I've had cause to use it this has seemed natural. John Bytheway

Similarly, unique_path() is poorly named because "unique path" is not a verbal. The function should be named "make_unique_path."
I also considered "make_unique_path", but worried that people might expect the actual path to be created in the external file system, rather than just returning a path object.
"Generate" would be less confusing: generate_unique_path() Added to do list. --Beman

On Thu, Feb 18, 2010 at 8:43 AM, Beman Dawes <bdawes@acm.org> wrote:
The first beta release of Boost.Filesystem Version 3 is now available.
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
The descriptions for portable_name() and portable_directory_name() appear to be at odds. portable_name() : ... && (name is "." or "..", and the first character not a period or hyphen) portable_directory_name(): ... && (name is "." or ".." or contains no periods) Should portable_name() be "... && (name is "." or "..", or contains no periods) && (first character not a hyphen)"? Maybe I'm missing something? Also, in the tut3.cpp tutorial code, there's no mention of the dangers of using "*it" inside the directory_iterator loop. IIRC, in previous Filesystem revisions, this would throw if you did not have adequate permissions to see the file to which "it" refers. Has this changed? If not, I think that you should either wrap the body of the loop in a try-catch block, or use one of the non-throwing function overloads, and indicate to the user what the exception safety concerns are. I think the tutorial code should exhibit enough robustness that the user can just cut-and-paste to get the basic code needed to use the library. Zach

On Thu, Feb 18, 2010 at 11:30 AM, Zach Laine <whatwasthataddress@gmail.com> wrote:
On Thu, Feb 18, 2010 at 8:43 AM, Beman Dawes <bdawes@acm.org> wrote:
The first beta release of Boost.Filesystem Version 3 is now available.
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
The descriptions for portable_name() and portable_directory_name() appear to be at odds.
portable_name() : ... && (name is "." or "..", and the first character not a period or hyphen)
portable_directory_name(): ... && (name is "." or ".." or contains no periods)
Should portable_name() be "... && (name is "." or "..", or contains no periods) && (first character not a hyphen)"? Maybe I'm missing something?
I haven't looked at these in years. They need to be brought up to date. Added to do list.
Also, in the tut3.cpp tutorial code, there's no mention of the dangers of using "*it" inside the directory_iterator loop. IIRC, in previous Filesystem revisions, this would throw if you did not have adequate permissions to see the file to which "it" refers. Has this changed?
Nice catch! No, nothing is changed. The code is not robust.
If not, I think that you should either wrap the body of the loop in a try-catch block, or use one of the non-throwing function overloads, and indicate to the user what the exception safety concerns are.
Yes, this would be a great place to introduce the non-throwing function overloads. Added to do list. I
think the tutorial code should exhibit enough robustness that the user can just cut-and-paste to get the basic code needed to use the library.
Definitely. If anyone spots other robustness issues, be sure to let me know. Thanks, --Beman

On 18 February 2010 09:43, Beman Dawes <bdawes@acm.org> wrote:
The first beta release of Boost.Filesystem Version 3 is now available.
Great! I've been looking forward to it. A few comments: - read_symlink Thank you! - path::preferred The implications of this one are strange to me. I would have expected the path itself to always be stored in the portable grammar, with the user-oriented display only done as a display issue, by using path::native. - path::absolute I agree that a verb here is important, since it's a mutator. I don't have anything cleverer than make_absolute, though. It's a shame that make_relative can't be done portably for anything beyond different root_names. - canonical(p) I'd like a non-member function specified to take an absolute path, expand any symlinks, and collapse any ".." path elements. Whether it should allow a context to work with relative paths I'm unsure. - regarding complete It seems like it'd just be canonical(path(a).absolute(b)), so it might not be needed. I suppose in some ways it's the version of canonical that takes a context. - uncomplete(p, base) My pet request. It may be useful to simplify other functions as well, since there's no current way to go from an absolute path to a relative one, meaning that most functions need to handle relative ones even when that might not be natural. With this functionality, preconditions requiring absolute paths would be less onerous. Precondition: p.is_absolute() && base.is_absolute() Effects: Extracts a path, rp, from p relative to base such that canonical(p) == complete(rp, base). Any ".." path elements in rp form a prefix. Returns: The extracted path. Postconditions: For the returned path, rp, rp.is_relative() == (p.root_name() == b.root_name()). [Notes: This function simplifies paths by omitting context. It is particularly useful for serializing paths such that it can be usefully moved between hosts where the context may be different, such as inside source control trees. It can also be helpful for display to users, such as in shells where paths are often shown relative to $HOME. In the presence of symlinks, the result of this function may differ between implementations, as some may expand symlinks that others may not. The simplest implementation uses canonical to expand both p and base, then removes the common prefix and prepends the requisite ".." elements. Smarter implementations will avoid expanding symlinks unnecessarily. No implementation is expected to discover new symlinks to return paths with fewer elements.] ~ Scott

On Thu, Feb 18, 2010 at 11:43 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On 18 February 2010 09:43, Beman Dawes <bdawes@acm.org> wrote:
The first beta release of Boost.Filesystem Version 3 is now available.
Great! I've been looking forward to it.
A few comments:
- read_symlink
Thank you!
Like a lot of features, it came about because someone asked for it.
- path::preferred
The implications of this one are strange to me. I would have expected the path itself to always be stored in the portable grammar,
It is more efficient to store the pathname in the native format. A pathname is created only once, but it may be used many times. There may be some system specific aspects of the naive format that can't be represented in the generic format.
with the user-oriented display only done as a display issue, by using path::native.
- path::absolute
I agree that a verb here is important, since it's a mutator. I don't have anything cleverer than make_absolute, though.
make_absolute() seems clearer than absolute(), so I'll go with that unless someone comes up with something even better.
It's a shame that make_relative can't be done portably for anything beyond different root_names.
Yep. I tried briefly to come up with a make_relative spec, but didn't have any brainstorms. I'd like to be sure that there are reasonably compelling real world use cases before adding yet another function.
- canonical(p)
I'd like a non-member function specified to take an absolute path, expand any symlinks, and collapse any ".." path elements.
Whether it should allow a context to work with relative paths I'm unsure.
Essentially a "tell me what this path resolves to" function? I've occasionally though about such a function, but it always ended up be lower priority than other work.
- regarding complete
It seems like it'd just be canonical(path(a).absolute(b)), so it might not be needed.
I was also thinking that the need may have evaporated as other functionality was added.
I suppose in some ways it's the version of canonical that takes a context.
Yes.
- uncomplete(p, base)
My pet request. It may be useful to simplify other functions as well, since there's no current way to go from an absolute path to a relative one, meaning that most functions need to handle relative ones even when that might not be natural. With this functionality, preconditions requiring absolute paths would be less onerous.
Precondition: p.is_absolute() && base.is_absolute()
Effects: Extracts a path, rp, from p relative to base such that canonical(p) == complete(rp, base). Any ".." path elements in rp form a prefix.
Returns: The extracted path.
Postconditions: For the returned path, rp, rp.is_relative() == (p.root_name() == b.root_name()).
[Notes: This function simplifies paths by omitting context. It is particularly useful for serializing paths such that it can be usefully moved between hosts where the context may be different, such as inside source control trees. It can also be helpful for display to users, such as in shells where paths are often shown relative to $HOME.
In the presence of symlinks, the result of this function may differ between implementations, as some may expand symlinks that others may not. The simplest implementation uses canonical to expand both p and base, then removes the common prefix and prepends the requisite ".." elements. Smarter implementations will avoid expanding symlinks unnecessarily. No implementation is expected to discover new symlinks to return paths with fewer elements.]
Interesting. I'm considering this a "Wish list feature request", so don't plan to do anything about it for the initial V3 release. Right now, all my effort is focused on issues that need to be fixed before the initial V3 release. Once that is in a good enough state to go into trunk (and release branch once tests are stable), I'll think about feature requests. Thanks, --Beman

On 2/21/2010 8:39 AM, Beman Dawes wrote:
- path::absolute
I agree that a verb here is important, since it's a mutator. I don't have anything cleverer than make_absolute, though.
make_absolute() seems clearer than absolute(), so I'll go with that unless someone comes up with something even better.
absolutify()? I would expect make_absolute() to return a different path object containing the absolute path rather than mutate the current path (although that's probably because I've been doing lots of Java lately). I'm not sure absolutify() is better, as a grep for 'absolute' won't hit it, and it seems kind of whimsical.
- canonical(p)
I'd like a non-member function specified to take an absolute path, expand any symlinks, and collapse any ".." path elements.
Whether it should allow a context to work with relative paths I'm unsure.
Essentially a "tell me what this path resolves to" function? I've occasionally though about such a function, but it always ended up be lower priority than other work.
This is useful for security checks and for non-security reasons where a unique identifier for a file is required (although off the top of my head I'm not sure it would result in one path-zero or one file).
- uncomplete(p, base) [snip]
+1. I understand the feature freeze; you don't have to do it now, but eventually. --Jeffrey Bosboom

Beman Dawes wrote:
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
Comments on the tutorial: Very nicely written. You answered questions that arose as you went along as if you were in my head! § Reporting the size of a file Typo: "file_size function returns an uintmax_t" s/an/a/ § Using status queries to determine file existence and type The code shown for tut2.cpp uses what appears to be inconsistent whitespace inside parentheses. Perhaps your style is to add whitespaces inside parentheses in conditionals, but it looks odd to me. § Directory iteration You write, "A directory_entry contains a path and symlink/non-symlink resolving file_status caches, and can be passed to path arguments in function calls." I don't understand that. What are "symlink/non-symlink resolving file_status caches?" Split the two parts into separate sentences for clarity. tut3.cpp should not create a default constructed directory_iterator each iteration (even if the optimizer could hoist it out of the loop): for (directory_iterator it(p), end; it != end; ++it) Perhaps it would be better to use an ostream_iterator. § Class path: Constructors In tut5.cpp, you commented out two blocks that wouldn't preclude the following two, so why not leave them uncommented? § Class path: Generic format vs. Native format It looks like the bold attribute was left on in the beginning of the first paragraph. Typo: "If a drive specified or a backslash appears" s/specified/specifier/ (I guess) § Class path: Iterators, observers, composition, decomposition, and query Clarification: "We will only show the output lines we are interested in at the moment." s/at the moment/at each step/ The path_info.cpp element loop also evaluates the end iterator each iteration. Typo: "Let's look at some at the output from the a slightly different example:" s/the a/a/ Typo: In the final Linux output box of this section, the last argument to ./path_info is "baz.txt" but the program output suggests that it should be "baa.txt." § Error reporting Clarification: "...and that's why most Boost.Filesystem operational functions come in two flavors." s/functions come/functions now come/ Clarification: "...by setting the ec argument..." s/the ec/that/ (there's no signature visible to confirm that "ec" is the name of the mentioned system::error_code & argument) _______________________ Other Issues There's concern over the term, "stem," expressed by others and me. Did you consider just "name?" I suppose that's too close to "filename," but "stem" is really odd and we should find something better. I think it is easier to explain and remember the difference between "filename" and "name" than to remember the odd meaning of "stem" for a path. That is, with "name," a pathname is a (possibly empty) path plus a filename, where a filename is comprised of a name and an optional extension. Finally, you use the term "observer" in the path class in an odd way. Those functions are accessors. "Observer" makes me think of MVC, signals/slots, etc. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Stewart, Robert wrote:
Beman Dawes wrote:
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
...
Typo: "Let's look at some at the output from the a slightly different example:"
s/the a/a/
Also s/some at the/some of the/ Jeff

On Fri, Feb 19, 2010 at 11:40 AM, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
Stewart, Robert wrote:
Beman Dawes wrote:
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
...
Typo: "Let's look at some at the output from the a slightly different example:"
s/the a/a/
Also
s/some at the/some of the/
Fixed. Thanks, --Beman

On Fri, Feb 19, 2010 at 11:23 AM, Stewart, Robert <Robert.Stewart@sig.com> wrote:
Beman Dawes wrote:
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
Comments on the tutorial:
Very nicely written. You answered questions that arose as you went along as if you were in my head!
That is really good to hear! I put a lot of hours into the tutorial trying to achieve that effect.
§ Reporting the size of a file
Typo: "file_size function returns an uintmax_t"
s/an/a/
Fixed.
§ Using status queries to determine file existence and type
The code shown for tut2.cpp uses what appears to be inconsistent whitespace inside parentheses. Perhaps your style is to add whitespaces inside parentheses in conditionals, but it looks odd to me.
Ouch... I changed my style from whitespace to no whitespace in the middle of the project, and failed to apply the change uniformly to the tutorial programs. Fixed.
§ Directory iteration
You write, "A directory_entry contains a path and symlink/non-symlink resolving file_status caches, and can be passed to path arguments in function calls." I don't understand that. What are "symlink/non-symlink resolving file_status caches?"
That's really an implementation detail, so I changed to sentence to just say "A directory_entry object contains a path and file_status information."
Split the two parts into separate sentences for clarity.
Done.
tut3.cpp should not create a default constructed directory_iterator each iteration (even if the optimizer could hoist it out of the loop):
for (directory_iterator it(p), end; it != end; ++it)
Perhaps it would be better to use an ostream_iterator.
Done.
§ Class path: Constructors
In tut5.cpp, you commented out two blocks that wouldn't preclude the following two, so why not leave them uncommented?
Changed.
§ Class path: Generic format vs. Native format
It looks like the bold attribute was left on in the beginning of the first paragraph.
Fixed.
Typo: "If a drive specified or a backslash appears"
s/specified/specifier/ (I guess)
Fixed.
§ Class path: Iterators, observers, composition, decomposition, and query
Clarification: "We will only show the output lines we are interested in at the moment."
s/at the moment/at each step/
Changed.
The path_info.cpp element loop also evaluates the end iterator each iteration.
Fixed.
Typo: "Let's look at some at the output from the a slightly different example:"
s/the a/a/
Fixed.
Typo: In the final Linux output box of this section, the last argument to ./path_info is "baz.txt" but the program output suggests that it should be "baa.txt."
Fixed.
§ Error reporting
Clarification: "...and that's why most Boost.Filesystem operational functions come in two flavors."
s/functions come/functions now come/
Clarification: "...by setting the ec argument..."
s/the ec/that/ (there's no signature visible to confirm that "ec" is the name of the mentioned system::error_code & argument)
Section reworked for greater clarity. Links added to reference documentation.
_______________________ Other Issues
There's concern over the term, "stem," expressed by others and me. Did you consider just "name?" I suppose that's too close to "filename," but "stem" is really odd and we should find something better. I think it is easier to explain and remember the difference between "filename" and "name" than to remember the odd meaning of "stem" for a path. That is, with "name," a pathname is a (possibly empty) path plus a filename, where a filename is comprised of a name and an optional extension.
In 2008 we already had several very lengthy discussions of names in this area, and changed several including to improve clarity.
Finally, you use the term "observer" in the path class in an odd way. Those functions are accessors. "Observer" makes me think of MVC, signals/slots, etc.
The "observer" terminology comes from the standard library. Since the reference documentation is basically what will be proposed for TR2, I'd really like to keep it aligned with the standard library's terminology. Thanks very much for your careful reading of the tutorial, and your many suggestions and corrections. They should all be reflected in beta 2. --Beman

Minor naming issue: create_directories is not a good name - it's misleading. It's easy to confuse with create_directory (first 15 chars); and it doesn't take a list of paths and create them all as one might think, in fact it might not create anything. establish_directory says better what the function does. Minor point, as said (and an old one, thinks I just missed filesystem V1 release posting this). Really appreciate the new simplified version, I'll probably use it more often. re On 2010-02-18 15:43, Beman Dawes wrote:
The first beta release of Boost.Filesystem Version 3 is now available.
The documentation can be viewed at http://mysite.verizon.net/beman/v3/index.htm
participants (10)
-
Beman Dawes
-
Gregory Peele ARA/CFD
-
Jeff Flinn
-
Jeffrey Bosboom
-
John Bytheway
-
Peter Dimov
-
rasmus ekman
-
Scott McMurray
-
Stewart, Robert
-
Zach Laine