boost::filesystem::path frustration

I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL. For example, when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors. In particular, this comes up because I'm trying to find the greatest common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. I end up having to resort to some really ugly code (or I just haven't figured out how to use this thing correctly). Why should paths be so different from everything else? I think, if the design is actually right, some rationale is sorely needed. Also, * (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual! * the docs claim you can construct a path from a "A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t", but doesn't say how that array will be interpreted. From the wording I might have assumed it accepts a CharT(&)[N] and the length of the input is taken as N, but inspecting the code shows it expects a CharT* and interprets the source as null-terminated. Thanks, -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Thu, Jan 24, 2013 at 8:56 PM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL. For example, when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
It isn't really a container, but it is convenient to supply iterators over the elements of the contained path. Should more container-like mutators be supplied? I'm neutral - they would occasionally be useful, but add more signatures to an already fat interface.
In particular, this comes up because I'm trying to find the greatest common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. I end up having to resort to some really ugly code (or I just haven't figured out how to use this thing correctly).
Not particularly elegant, but this does work: path x("/foo/bar"); path y("/foo/baar"); auto result = std::mismatch(x.begin(), x.end(), y.begin()); path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr; std::cout << prefix << std::endl;
Why should paths be so different from everything else? I think, if the design is actually right, some rationale is sorely needed.
Also,
* (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual!
It is a kludge to deal with the type of the contained string being implementation defined and not necessarily the type the user wants. In other words, a misuse of path to supply string interoperability. The returned type should ideally be a basic_string, with begin() and end() templatized on the string details, but I didn't think of that until recently.
* the docs claim you can construct a path from a "A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t", but doesn't say how that array will be interpreted. From the wording I might have assumed it accepts a CharT(&)[N] and the length of the input is taken as N, but inspecting the code shows it expects a CharT* and interprets the source as null-terminated.
I'll make some doc changes per your comments above. --Beman

On Friday 25 January 2013 11:30:26 Beman Dawes wrote:
On Thu, Jan 24, 2013 at 8:56 PM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL. For example, when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
It isn't really a container, but it is convenient to supply iterators over the elements of the contained path. Should more container-like mutators be supplied? I'm neutral - they would occasionally be useful, but add more signatures to an already fat interface.
IMHO, path should not pretend to be a container. Things like push_back, insert, erase don't make sense with respect to it. It could provide begin/end iterators over underlying characters but just to implement other algorithms. Iterating over path elements (i.e. what is currently achieved with begin/end) should probably be an external tool, like an iterator adaptor or a view on top of the path object. In the end it should become just a thin wrapper over a string, with a few path-related functions.

On Fri, Jan 25, 2013 at 6:49 PM, Andrey Semashev <andrey.semashev@gmail.com>wrote:
[...]
IMHO, path should not pretend to be a container. Things like push_back, insert, erase don't make sense with respect to it. It could provide begin/end iterators over underlying characters but just to implement other algorithms. Iterating over path elements (i.e. what is currently achieved with begin/end) should probably be an external tool, like an iterator adaptor or a view on top of the path object. In the end it should become just a thin wrapper over a string, with a few path-related functions.
What are those path-related functions that you will leave there? extension()? has_root()? Maybe these are also better to be free standing functions working on arbitrary strings? You are left with operator/, which also can be declared as string operator/(const string&, const string&); and be brought to scope with a using directive, for those who want to use it. Essentially you are left with a plain old string of a platform dependent encoding. This is what I said in my first post—thinking that "a path as a string" is applicable where you don't care for what is inside the path (in which case just use a string as a cookie), but in other cases it is nonsense. So please explain why "path should not pretend to be a container". I agree in general that implementing a view for iterating over path elements is an acceptable strategy, though. But in this case it is better to scrap the path class completely. Cheers, -- Yakov

On Friday 25 January 2013 23:55:19 Yakov Galka wrote:
On Fri, Jan 25, 2013 at 6:49 PM, Andrey Semashev
<andrey.semashev@gmail.com>wrote:
[...]
IMHO, path should not pretend to be a container. Things like push_back, insert, erase don't make sense with respect to it. It could provide begin/end iterators over underlying characters but just to implement other algorithms. Iterating over path elements (i.e. what is currently achieved with begin/end) should probably be an external tool, like an iterator adaptor or a view on top of the path object. In the end it should become just a thin wrapper over a string, with a few path-related functions.
What are those path-related functions that you will leave there? extension()? has_root()? Maybe these are also better to be free standing functions working on arbitrary strings?
Which functions to leave as members and which to extract as free functions is a design choice. I would probably leave operations that have meaning of accessing parts of the path (like filename, extension, parent path, etc.) as members and extract everything else.
You are left with operator/, which also can be declared as
string operator/(const string&, const string&);
and be brought to scope with a using directive, for those who want to use it.
operator/ doesn't make sense for strings, it should never be defined like that. Whether it is defined as a member function or a free function is not essential, but it should be specific for paths.
Essentially you are left with a plain old string of a platform dependent encoding. This is what I said in my first post—thinking that "a path as a string" is applicable where you don't care for what is inside the path (in which case just use a string as a cookie), but in other cases it is nonsense.
Well, most of the time all I want is a wrapper around a string that conceals platform-specific filesystem differences, nothing more. So yes, a path is a plain old string with a few helpers to compose and interpret it in a portable way. And I don't need anything more sophisticated from it. Another part of the library is a set of algorithms to operate on the actual filesystem (like, copying the files and testing for path equivalence); this part is also required and very useful. These algorithms abstract away another aspect of the filesystem (namely, the underlying OS API) and they don't need anything sophisticated from the path either. Interestingly, most of these algorithms can operate with plain strings instead of paths. So basically, the path does not need to be a high-level beast to be useful. That does not mean that such high-level tools are useless, but they can be implemented externally, without cluttering the path interface.
So please explain why "path should not pretend to be a container".
I think I've answered this partially above. I'll add that a container is supposed to contain elements and I'm not sure what is supposed to be an element of a path. A path is basically an identifier for a file, which is a more or less indivisible entity. Surely you can introspect its contents, but these parts either won't be a path (e.g. extension) or will be a path identifying another file (e.g. parent path).
I agree in general that implementing a view for iterating over path elements is an acceptable strategy, though. But in this case it is better to scrap the path class completely.
I disagree. Path type is very useful, if not for its operations then for type clarity at the very least.

on Fri Jan 25 2013, Andrey Semashev <andrey.semashev-AT-gmail.com> wrote:
On Friday 25 January 2013 11:30:26 Beman Dawes wrote:
On Thu, Jan 24, 2013 at 8:56 PM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL. For example, when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
It isn't really a container, but it is convenient to supply iterators over the elements of the contained path. Should more container-like mutators be supplied? I'm neutral - they would occasionally be useful, but add more signatures to an already fat interface.
IMHO, path should not pretend to be a container. Things like push_back, insert, erase don't make sense with respect to it.
As soon as you can iterate it and append path elements (and you can now), push_back, insert, and erase make wonderful sense. I understand exactly what those operations should mean, and I have use cases to boot!
It could provide begin/end iterators over underlying characters but just to implement other algorithms. Iterating over path elements (i.e. what is currently achieved with begin/end) should probably be an external tool, like an iterator adaptor or a view on top of the path object. In the end it should become just a thin wrapper over a string, with a few path-related functions.
Why? IME path manipulation almost universally occurs on directory boundaries, so xposing a character-based interface as the primary one for path seems counter-productive. IMO there should be a way to get to the underlying characters if you want them, but the primary interface should be a container of pahth elements. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Friday 25 January 2013 18:45:30 Dave Abrahams wrote:
on Fri Jan 25 2013, Andrey Semashev <andrey.semashev-AT-gmail.com> wrote:
IMHO, path should not pretend to be a container. Things like push_back, insert, erase don't make sense with respect to it.
As soon as you can iterate it and append path elements (and you can now), push_back, insert, and erase make wonderful sense. I understand exactly what those operations should mean, and I have use cases to boot!
It could provide begin/end iterators over underlying characters but just to implement other algorithms. Iterating over path elements (i.e. what is currently achieved with begin/end) should probably be an external tool, like an iterator adaptor or a view on top of the path object. In the end it should become just a thin wrapper over a string, with a few path-related functions.
Why? IME path manipulation almost universally occurs on directory boundaries, so xposing a character-based interface as the primary one for path seems counter-productive. IMO there should be a way to get to the underlying characters if you want them, but the primary interface should be a container of pahth elements.
In my previous reply to Yakov I explained why I don't see path as a container. You seem to want to work with paths like with vector<string_ref> but I don't think this is portable. You don't know what the string_ref contains and what restrictions apply to the content on the particular platform. E.g. on Windows it can contain a subdirectory name or a drive prefix (e.g. C:). Also, working with extensions or otherwise modifying file names like adding suffixes, etc. fall below the vector<string_ref> interface, to the character level. In the end to write portable code you have to work with paths like with opaque identifiers, with abilities to extract some of its components (like filename and parent path) and compose new paths from them. Note that I'm not saying that vector<string_ref> interface is not useful. But it is not needed for filesystem operations (like copying files) and it is not needed for basic and portable path processing (like getting a path from a config and appending a file name). Therefore this interface should be external from the path itself. IMHO, of course.

on Sat Jan 26 2013, Andrey Semashev <andrey.semashev-AT-gmail.com> wrote:
On Friday 25 January 2013 18:45:30 Dave Abrahams wrote:
on Fri Jan 25 2013, Andrey Semashev <andrey.semashev-AT-gmail.com> wrote:
IMHO, path should not pretend to be a container. Things like push_back, insert, erase don't make sense with respect to it.
As soon as you can iterate it and append path elements (and you can now), push_back, insert, and erase make wonderful sense. I understand exactly what those operations should mean, and I have use cases to boot!
It could provide begin/end iterators over underlying characters but just to implement other algorithms. Iterating over path elements (i.e. what is currently achieved with begin/end) should probably be an external tool, like an iterator adaptor or a view on top of the path object. In the end it should become just a thin wrapper over a string, with a few path-related functions.
Why? IME path manipulation almost universally occurs on directory boundaries, so xposing a character-based interface as the primary one for path seems counter-productive. IMO there should be a way to get to the underlying characters if you want them, but the primary interface should be a container of pahth elements.
In my previous reply to Yakov I explained why I don't see path as a container. You seem to want to work with paths like with vector<string_ref> but I don't think this is portable. You don't know what the string_ref contains and what restrictions apply to the content on the particular platform.
I don't care about those restrictions. I'm happy with the set of algorithms that will work when string_refs can be compared.
E.g. on Windows it can contain a subdirectory name or a drive prefix (e.g. C:).
I'm well aware. It wouldn't make a bit of difference to my use case or many others.
Also, working with extensions or otherwise modifying file names like adding suffixes, etc. fall below the vector<string_ref> interface, to the character level.
Yes. So what? We already have strings for that purpose. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Fri, Jan 25, 2013 at 4:30 PM, Beman Dawes <bdawes@acm.org> wrote:
On Thu, Jan 24, 2013 at 8:56 PM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL. For example, when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
It isn't really a container, but it is convenient to supply iterators over the elements of the contained path. Should more container-like mutators be supplied? I'm neutral - they would occasionally be useful, but add more signatures to an already fat interface.
Perhaps a path could have an interface analagous to std::vector<std::string>, even if the implementation is optimised somewhat to keep the commonly accessed string representation as the underlying storage. Perhaps random access would be a bit daft, but it does seem reasonable to converge the interface. Additionally it might help define a new Concept that it a subset of a Container to assist with Dave's goal of maximising reuse within other algorithms.
In particular, this comes up because I'm trying to find the greatest common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. I end up having to resort to some really ugly code (or I just haven't figured out how to use this thing correctly).
I wonder if this is *really* what you want! I suspect that you probably want to determine the common effective prefix of the paths after canonicalisation. For illustration: I suspect that the result you want from fn("/usr/sbin/../bin/test1.txt", "/usr/bin/test2.txt") is "/usr/bin" rather than "/usr". The inclusion or exclusion of links is less obvious. My experience is that for the most-part I simply want the absolute canonical representation to be considered. Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
std::cout << prefix << std::endl;
I think this code doesn't "work" because it meets the stated requirements exactly! I think the requirements are normally greater than those we first think of when looking at the problem.
Why should paths be so different from everything else? I think, if the design is actually right, some rationale is sorely needed.
Also,
* (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual!
It is a kludge to deal with the type of the contained string being implementation defined and not necessarily the type the user wants. In other words, a misuse of path to supply string interoperability. The returned type should ideally be a basic_string, with begin() and end() templatized on the string details, but I didn't think of that until recently.
* the docs claim you can construct a path from a "A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t", but doesn't say how that array will be interpreted. From the wording I might have assumed it accepts a CharT(&)[N] and the length of the input is taken as N, but inspecting the code shows it expects a CharT* and interprets the source as null-terminated.
I'll make some doc changes per your comments above.
The addition of a make_relative_path function has been discussed and code to provide the relative path from the canonical formats has been submitted previously see: http://stackoverflow.com/questions/10167382/boostfilesystem-get-relative-pat... It looks to be a very valuable feature even if it the implementation requires adjustment.
--Beman
With all that stated, I have found the recent versions of Boost.Filesystem to support my use-cases elegantly and without issue. Indeed it frequently offers superior solutions that are much better considered and though through than those provided by many scripting languages. Obviously while most of my communication has been about what I would like to see done differently I am a grateful user of this library. Thank you for your hard work. Regards, Neil Groves

Neil Groves wrote:
For illustration: I suspect that the result you want from fn("/usr/sbin/../bin/test1.txt", "/usr/bin/test2.txt") is "/usr/bin" rather than "/usr".
Not necessarily. If /usr/sbin is a symbolic link to, say, /opt/sbin, the 'real' common prefix is not /usr/bin. Depending on the use case, /usr may well be more correct, it being the logical common ancestor (but not the physical common ancestor).

On Fri, Jan 25, 2013 at 5:37 PM, Peter Dimov <lists@pdimov.com> wrote:
Neil Groves wrote:
For illustration:
I suspect that the result you want from fn("/usr/sbin/../bin/test1.** txt", "/usr/bin/test2.txt") is "/usr/bin" rather than "/usr".
Not necessarily. If /usr/sbin is a symbolic link to, say, /opt/sbin, the 'real' common prefix is not /usr/bin. Depending on the use case, /usr may well be more correct, it being the logical common ancestor (but not the physical common ancestor).
I don't disagree with your point at all.
I was attempting to communicate that we almost always want to handle the relative canonicalisation in a known and obvious manner while the other complexities due to file links and whatnot are more complicated with no single obviously correct answer under all circumstances. Hence the point you are making is one I was attempting to make. The main point, of course, was that the simplistic handling of just using unadjusted boost::filesystem::paths without any adjustment is often a source of error when attempting to implement this type of algorithm. Neil Groves

on Fri Jan 25 2013, "Peter Dimov" <lists-AT-pdimov.com> wrote:
Neil Groves wrote:
For illustration: I suspect that the result you want from fn("/usr/sbin/../bin/test1.txt", "/usr/bin/test2.txt") is "/usr/bin" rather than "/usr".
Not necessarily. If /usr/sbin is a symbolic link to, say, /opt/sbin, the 'real' common prefix is not /usr/bin. Depending on the use case, /usr may well be more correct, it being the logical common ancestor (but not the physical common ancestor).
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1 I also like the idea that a path is a container of elements. ___ Rob

on Sun Jan 27 2013, Rob Stewart <robertstewart-AT-comcast.net> wrote:
On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1
I also like the idea that a path is a container of elements.
In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem. That would very clearly distinguish path operations from filesystem ones: since you don't need a filesystem to manipulate paths, the signatures would differ. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

2013/1/27 Dave Abrahams <dave@boostpro.com>:
on Sun Jan 27 2013, Rob Stewart <robertstewart-AT-comcast.net> wrote:
On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1
I also like the idea that a path is a container of elements.
In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem.
... or a virtual filesystem (eg. an archive).
That would very clearly distinguish path operations from filesystem ones: since you don't need a filesystem to manipulate paths, the signatures would differ.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

on Sun Jan 27 2013, Daniel Pfeifer <daniel-AT-pfeifer-mail.de> wrote:
2013/1/27 Dave Abrahams <dave@boostpro.com>:
on Sun Jan 27 2013, Rob Stewart <robertstewart-AT-comcast.net> wrote:
On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1
I also like the idea that a path is a container of elements.
In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem.
... or a virtual filesystem (eg. an archive).
+1 -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

Hi, On Mon, Jan 28, 2013 at 9:55 AM, Dave Abrahams <dave@boostpro.com> wrote:
on Sun Jan 27 2013, Daniel Pfeifer <daniel-AT-pfeifer-mail.de> wrote:
2013/1/27 Dave Abrahams <dave@boostpro.com>:
on Sun Jan 27 2013, Rob Stewart <robertstewart-AT-comcast.net> wrote:
On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1
I also like the idea that a path is a container of elements.
In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem.
... or a virtual filesystem (eg. an archive).
+1
If handling virtual or remote file systems is needed, isn't it better to generalize 'path' to handle URIs[1]? [1] http://www.ietf.org/rfc/rfc3986.txt -- Ryo IGARASHI, Ph.D. rigarash@gmail.com

In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem.
FYI: In the Software Communications Architecture (SCA), used in various software defined radio implementations, that is how files are handled - file systems are CORBA objects, as are files.

In a previous job, I had to work very heavily with file systems and Windows registry trees. For that use-case, I took paths and put them in a structure that represented a tree view of the system. That is, I iterated over the file system and created a tree to represent all the paths. Currently, I use the file system in a very light sort of way, and I very much enjoy the use of boost::filesystem as it exists. As such, I'm not sure I'd want to change it overmuch from its current behavior. If anything, I almost wish we had a boost::registry for Windows that provided much of the same kind of functionality in boost::filesystem, but for the Windows registry. I say 'almost', though, because I don't really like to use the Windows registry at all, and use INI files for configuration information instead. If I had to return to my previous job, though, I would probably want to use a directory iterator to help build a tree, and to extract boost::filesystem::path objects out of that tree as desired. But then, I'd also want somewhat complicated algorithms that compared different trees to obtain differences (paths removed or created). Would it be helpful to have a kind of boost::tree that could be built by one or more iterators, which had properties like a container, but generali(z/s)ed such that it could be built by iterators over a file system or a Windows registry, or some other mechanism that can build trees? - Trey On Sun, Jan 27, 2013 at 9:22 PM, David Hagood <david.hagood@gmail.com>wrote:
In fact there probably ought to be an object representing the local
filesystem, so you could also (in principle) do operations on a remote filesystem.
FYI: In the Software Communications Architecture (SCA), used in various software defined radio implementations, that is how files are handled - file systems are CORBA objects, as are files.
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

on Mon Jan 28 2013, Ryo IGARASHI <rigarash-AT-gmail.com> wrote:
Hi,
On Mon, Jan 28, 2013 at 9:55 AM, Dave Abrahams <dave@boostpro.com> wrote:
on Sun Jan 27 2013, Daniel Pfeifer <daniel-AT-pfeifer-mail.de> wrote:
2013/1/27 Dave Abrahams <dave@boostpro.com>:
on Sun Jan 27 2013, Rob Stewart <robertstewart-AT-comcast.net> wrote:
On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1
I also like the idea that a path is a container of elements.
In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem.
... or a virtual filesystem (eg. an archive).
+1
If handling virtual or remote file systems is needed, isn't it better to generalize 'path' to handle URIs[1]?
-1. In that case you have to build knowledge about every filesystem-y thing into "path." That's an awful lot of coupling and inflexibility. If you want to handle URIs, there should be a uri_filesystem type that has an extensible set of handlers (other filesystem objects) that can be plugged in. But you shouldn't have to use that, which requires dynamic dispatching, just to traverse a known filesystem. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

Dave Abrahams <dave@boostpro.com> writes:
on Sun Jan 27 2013, Rob Stewart <robertstewart-AT-comcast.net> wrote:
On Jan 25, 2013, at 6:52 PM, Dave Abrahams <dave@boostpro.com> wrote:
IMO paths are abstract entities that aren't necessarily realized in the local filesystem. The results of pure path manipulations must therefore not depend on the state of the local filesystem. Operations accepting paths as input that depend on the local filesystem structure should be seen as operations on the filesystem rather than operations on paths.
+1
I also like the idea that a path is a container of elements.
In fact there probably ought to be an object representing the local filesystem, so you could also (in principle) do operations on a remote filesystem. That would very clearly distinguish path operations from filesystem ones: since you don't need a filesystem to manipulate paths, the signatures would differ.
This is exactly what I've been working on recently: mirroring the Boost.Filesystem functions with versions that take an extra argument which, at the moment, I pass my SFTP filesystem object. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

on Fri Jan 25 2013, Neil Groves <neil-AT-grovescomputing.com> wrote:
On Fri, Jan 25, 2013 at 4:30 PM, Beman Dawes <bdawes@acm.org> wrote:
On Thu, Jan 24, 2013 at 8:56 PM, Dave Abrahams <dave@boostpro.com> wrote:
In particular, this comes up because I'm trying to find the greatest common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. I end up having to resort to some really ugly code (or I just haven't figured out how to use this thing correctly).
I wonder if this is *really* what you want!
A little credit, please. Yes, it's *really* what I want.
I suspect that you probably want to determine the common effective prefix of the paths after canonicalisation.
No, the paths are known to be already canonicalized (with a 'z' ;->)
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
std::cout << prefix << std::endl;
I think this code doesn't "work" because it meets the stated requirements exactly! I think the requirements are normally greater than those we first think of when looking at the problem.
A. you didn't know my requirements ;-), and B. for such an operation requiring the input paths to be canonical beforehand might in fact be the most appropriate interface. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

I wonder if this is *really* what you want!
A little credit, please. Yes, it's *really* what I want.
Ah, please accept my apologies. You by default always have as much credit as is humanly possible from me! I guessed that you might have wanted a little more than you asked for. This was simply because I have made the mistake of choosing the simpler requirement set when actually later I realized I wanted the path processing. Sorry.
I suspect that you probably want to determine the common effective prefix of the paths after canonicalisation.
No, the paths are known to be already canonicalized (with a 'z' ;->)
I'll try and remember to use 'z'!
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
std::cout << prefix << std::endl;
I think this code doesn't "work" because it meets the stated requirements exactly! I think the requirements are normally greater than those we
first
think of when looking at the problem.
A. you didn't know my requirements ;-), and
True, but I would have felt bad if I had spotted a potential issue, stayed silent and then found out that I could have saved you some time.
B. for such an operation requiring the input paths to be canonical beforehand might in fact be the most appropriate interface.
I can see an argument for that. I wasn't attempting to suggest that there was one obvious correct idiom. I was really aiming to link to the previous discussion and the supplied implementation for a solution. I think on most observations I perceive similar improvements to be possible. My only real disagreement is with labelling Boost.Filesystem as frustrating. For the most part it helps me avoid writing tedious code. I like the idea of improving the iteration scheme and making it more like a container.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost
Regards, Neil Groves

on Sat Jan 26 2013, Neil Groves <neil-AT-grovescomputing.com> wrote:
I suspect that you probably want to determine the common effective prefix of the paths after canonicalisation.
No, the paths are known to be already canonicalized (with a 'z' ;->)
I'll try and remember to use 'z'!
I was just teasing! That's an American vs. English English thing ;^).
B. for such an operation requiring the input paths to be canonical beforehand might in fact be the most appropriate interface.
I can see an argument for that. I wasn't attempting to suggest that there was one obvious correct idiom. I was really aiming to link to the previous discussion and the supplied implementation for a solution. I think on most observations I perceive similar improvements to be possible. My only real disagreement is with labelling Boost.Filesystem as frustrating.
I didn't. I said I have frustrations with one particular component thereof.
For the most part it helps me avoid writing tedious code. I like the idea of improving the iteration scheme and making it more like a container.
Ah, but a container of *what*? That's the question. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On 01/25/2013 08:30 AM, Beman Dawes wrote:
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
I believe this last bit is just std::accumulate with std::divides<path>. Whether that better expresses intent is another matter. Eric

on Fri Jan 25 2013, Eric Niebler <eniebler-AT-boost.org> wrote:
On 01/25/2013 08:30 AM, Beman Dawes wrote:
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
I believe this last bit is just std::accumulate with std::divides<path>. Whether that better expresses intent is another matter.
Seriously, no. I have no problem using / for path construction, but "divides" means division. -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On 1/25/2013 3:54 PM, Dave Abrahams wrote:
on Fri Jan 25 2013, Eric Niebler <eniebler-AT-boost.org> wrote:
On 01/25/2013 08:30 AM, Beman Dawes wrote:
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
I believe this last bit is just std::accumulate with std::divides<path>. Whether that better expresses intent is another matter.
Seriously, no. I have no problem using / for path construction, but "divides" means division.
struct append_path : std::divides<path> {}; path prefix = std::accumulate( x.begin(), result.first, path(), append_path() ); :-) -- Eric Niebler Boost.org http://www.boost.org

on Sun Jan 27 2013, Eric Niebler <eniebler-AT-boost.org> wrote:
On 1/25/2013 3:54 PM, Dave Abrahams wrote:
on Fri Jan 25 2013, Eric Niebler <eniebler-AT-boost.org> wrote:
On 01/25/2013 08:30 AM, Beman Dawes wrote:
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
I believe this last bit is just std::accumulate with std::divides<path>. Whether that better expresses intent is another matter.
Seriously, no. I have no problem using / for path construction, but "divides" means division.
struct append_path : std::divides<path> {}; path prefix = std::accumulate( x.begin(), result.first, path(), append_path() );
That one's fine :-) -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Fri, Jan 25, 2013 at 6:30 PM, Beman Dawes <bdawes@acm.org> wrote:
[...] Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
std::cout << prefix << std::endl;
This code doesn't work on windows, as I described in my previous post.
Why should paths be so different from everything else? I think, if the design is actually right, some rationale is sorely needed.
Also,
* (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual!
It is a kludge to deal with the type of the contained string being implementation defined and not necessarily the type the user wants. In other words, a misuse of path to supply string interoperability. The returned type should ideally be a basic_string, with begin() and end() templatized on the string details, but I didn't think of that until recently.
Well, in this particular issue no-one said that iterator::value_type must be a string. Let it be a path_element, since (--end()).extension() still seems to be useful. (Or maybe let extension() be a free-standing function operating on arbitrary strings?) As per "string being implementation defined", you must already know my opinion. This is a bad design choice. A non-templatized path means shoving the user a concrete character encoding and memory allocation scheme. Now, please follow the following argument: The rationale for using the "native" encoding is that it is supposed to be frequently passed to/from the system, and therefore supposedly being inefficient if done otherwise. *But* the high-level path operation that are (supposed to be) provided by boost::path are usually done in a higher-level code. Now, if I use narrow UTF-8 strings in my code, then there will be more UTF-8 → UTF-16 → UTF-8 conversions (on windows) than actual calls to the system. Therefore, it is more logical to let the user choose when she likes to do the conversion to the system encoding (if at all) and provide a narrow boost::path as it has been in Boost.FSv2. Cheers, -- Yakov

on Fri Jan 25 2013, Beman Dawes <bdawes-AT-acm.org> wrote:
On Thu, Jan 24, 2013 at 8:56 PM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL. For example, when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
It isn't really a container,
Well, why not? It does most things that containers do, but with different names. And to expose iterators but then not let me use those iterators to modify the path is... well, disappointing.
but it is convenient to supply iterators over the elements of the contained path. Should more container-like mutators be supplied?
It certainly would make it more useful. I could then employ, e.g. back_inserter. But I also have problems with the fact that it's constructed with a range of characters but its iterators traverse a range of paths. It should at least have a constructor that takes an iterator range over the iterator's value_type.
I'm neutral - they would occasionally be useful, but add more signatures to an already fat interface.
Maybe some of the other interfaces should be dropped then :-)
In particular, this comes up because I'm trying to find the greatest common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. I end up having to resort to some really ugly code (or I just haven't figured out how to use this thing correctly).
Not particularly elegant, but this does work:
path x("/foo/bar"); path y("/foo/baar");
auto result = std::mismatch(x.begin(), x.end(), y.begin());
path prefix; for (auto itr = x.begin(); itr != result.first; ++itr) prefix /= *itr;
std::cout << prefix << std::endl;
Nor is it particularly efficient. I am going to do this with every path that appears in boost's SVN dump, of which there are many. "Greatest common prefix" is not an unusual thing to want to do with paths. It should be both elegant and efficient.
Why should paths be so different from everything else? I think, if the design is actually right, some rationale is sorely needed.
Also,
* (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual!
It is a kludge to deal with the type of the contained string being implementation defined and not necessarily the type the user wants. In other words, a misuse of path to supply string interoperability. The returned type should ideally be a basic_string, with begin() and end() templatized on the string details, but I didn't think of that until recently.
It should ideally be a type that can be constructed without allocating storage and copying characters from the source path, like the recently-discussed string_ref.
* the docs claim you can construct a path from a "A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t", but doesn't say how that array will be interpreted. From the wording I might have assumed it accepts a CharT(&)[N] and the length of the input is taken as N, but inspecting the code shows it expects a CharT* and interprets the source as null-terminated.
I'll make some doc changes per your comments above.
--Beman
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Fri, Jan 25, 2013 at 3:56 AM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL.
I'm glad seeing that I'm not the only one thinking this way... this is one of the reasons I don't use it (I got *very* frustrated each time I used this library). It is a mix of high level and low level concepts, i.e. of conceptual paths (sequences of some path elements) and strings (sequences of characters). One does not need a strong typedef for paths! If I wanted to work with paths as with strings, I could just use strings (and this is what I usually do). Nor I want a "system encoding" string that behaves differently on different platforms (I use UTF-8 std::strings everywhere). Instead, a path class shall provide a higher level abstraction. First, what *is* a path? One can think of it as "a string naming some resource in the file system", but this definition is low-level, treating the path as a cookie, and thus not useful for characterizing path specific semantics. Definition (path): A path is a sequence of *instructions* (aka path elements) for locating a resource. Following the instruction to locate the resource is called "resolving the path". Really, this is how the OS understands paths, therefore I do not even consider this open to debate. Paths is so a generic concept that it can be applied to much more than just FS paths (even for treasure hunting :)). Definition (equivalence): Paths x and y are equivalent iff, for any system state, resolving x and y yields the same resource, assuming the resolution is successful. operator== should test for equivalence. The equivalent(x,y) function is currently misleading. It should instead be replaced with resolve(x) that would return a key that can be used to test "equivalence" (so that one could use it in associative containers, which is currently not possible). Now one could do resolve(x) == resolve(y) instead, and much more. For example,
when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
Let me add some other issue to the list. operator / ------------------ operator / is defined syntactical (low level). This is wrong, and it does not work well. Expected definition: x/y returns a path z s.t. for any initial system state, evaluation of z yield the same resource as the evaluation of elements of x followed by elements of y, assuming all evaluations succeed. In particular, for any initial system state, (current_path(x), current_path(y), current_path()) shall resolve to the same resource as (current_path(z), current_path()). Differences: "c:" / "b" gives "c:\b" in boost but "c:b" according to the above definition. Why is it wrong? Thinking of paths rather than strings, when concatenating two path elements, why would there suddenly appear a third one? Furthermore, considering the current description of parent_path() which is formulated using the wrong operator /=, path("c:a/b").parent_path() should return "c:\a" according to the documentation. I guess (and hope) this is not what the implementation actually does. In fact, I believe that parent_path shall not exist. Instead there shall be pop_back() that does that in-place, and an iterator constructor that constructs a path "as if" by successively applying operator /= on each path element. Similarly "a" / "/b" gives "a/b" in boost but "/b" with the above definition. operator+= ---------------- This is a new surprise for me, as it wasn't there last time I expected the library, Is there any rationale for this path-string duality syndrome? And bravo for the inconsistency: we have operator += without a corresponding operator + (yes, I understand the the later is dangerous, but so is the first one). size() --------- The lack of it is not an issue. Path shall be a bidirectional range with value_type of "path_element". The computation of size would not be only inefficient, but also useless. What's the logic of counting the number of path elements? And in case you do want it in some esoteric case, you can use distance(begin, end). assign/append/constructor operations taking ranges ---------------------------- These should accept ranges of path_elements and work as-if by applying each element with /=. absolute(p, base) -------------------- The case when base has no root_directory makes no sense. Boost.FS gives absolute("a:x") == "a:\z\x" if current_path() == "c:\z", but the more logical thing is "a:\w\x" if the current directory of drive "a:" (environment variable "=A:") is "a:\w". Another example, using paths from real world: let p = "Australia, current city, King St., No. 10", base = "Tel-Aviv, Menachem Begin St., No. 7". boost::absolute(p,base) will bring you to: "Australia, Tel-Aviv, King St. No 10", while what you want is to go to Australia, ask yourself "what city I'm currently in?" and if you landed in Melbourne you'll get to "Australia, Melbourne, King St., No. 10". Expected definition: Returns system_complete(base/p). In particular, this comes up because I'm trying to find the greatest
common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. [...]
Question: why you need to do this? (I do not mean that it is not needed, just curious.) Why should paths be so different from everything else? I think, if the
design is actually right, some rationale is sorely needed.
They shouldn't. I believe the design of Boost.FS is somewhat wrong, mixing different concepts together.
Also,
* (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual!
Agree. This bothers me too. IMO the value_type of paths and iterators shall be the same, and likely be either a string or a path_element class, which will provide appropriate interface. This includes observers: * to query the type of the element. It can be one of sub_directory, parent_directory or current_directory to avoid comparison with ".." and "." (on OpenVMS it uses other strings, right?) as well as some implementation specific values to query the values that were already known to the parser (for example: drive (both \\?\c: or c:) or a UNC (both \\?\UNC\x or \\x) on windows). * to query the stem and extension. Cheers, -- Yakov
participants (15)
-
Alexander Lamaison
-
Andrew Hundt
-
Andrey Semashev
-
Beman Dawes
-
Daniel James
-
Daniel Pfeifer
-
Dave Abrahams
-
David Hagood
-
Eric Niebler
-
Joseph Van Riper
-
Neil Groves
-
Peter Dimov
-
Rob Stewart
-
Ryo IGARASHI
-
Yakov Galka