
On Fri, Jan 25, 2013 at 3:56 AM, Dave Abrahams <dave@boostpro.com> wrote:
I'm finding that boost::filesystem::path seems to be a strange mix of different beasts, unlike any entity we have in the STL.
I'm glad seeing that I'm not the only one thinking this way... this is one of the reasons I don't use it (I got *very* frustrated each time I used this library). It is a mix of high level and low level concepts, i.e. of conceptual paths (sequences of some path elements) and strings (sequences of characters). One does not need a strong typedef for paths! If I wanted to work with paths as with strings, I could just use strings (and this is what I usually do). Nor I want a "system encoding" string that behaves differently on different platforms (I use UTF-8 std::strings everywhere). Instead, a path class shall provide a higher level abstraction. First, what *is* a path? One can think of it as "a string naming some resource in the file system", but this definition is low-level, treating the path as a cookie, and thus not useful for characterizing path specific semantics. Definition (path): A path is a sequence of *instructions* (aka path elements) for locating a resource. Following the instruction to locate the resource is called "resolving the path". Really, this is how the OS understands paths, therefore I do not even consider this open to debate. Paths is so a generic concept that it can be applied to much more than just FS paths (even for treasure hunting :)). Definition (equivalence): Paths x and y are equivalent iff, for any system state, resolving x and y yields the same resource, assuming the resolution is successful. operator== should test for equivalence. The equivalent(x,y) function is currently misleading. It should instead be replaced with resolve(x) that would return a key that can be used to test "equivalence" (so that one could use it in associative containers, which is currently not possible). Now one could do resolve(x) == resolve(y) instead, and much more. For example,
when you construct it from a pair of iterators, they're expected to be iterators over characters, but when you iterate over the path itself, you are iterating over strings of some kind (**). Even though, once constructed, this thing acts sort of like a container, it supports none of the usual container mutators (e.g. push_back, pop_back, erase) or even queries (e.g. size()), making it incompatible with generic algorithms and adaptors.
Let me add some other issue to the list. operator / ------------------ operator / is defined syntactical (low level). This is wrong, and it does not work well. Expected definition: x/y returns a path z s.t. for any initial system state, evaluation of z yield the same resource as the evaluation of elements of x followed by elements of y, assuming all evaluations succeed. In particular, for any initial system state, (current_path(x), current_path(y), current_path()) shall resolve to the same resource as (current_path(z), current_path()). Differences: "c:" / "b" gives "c:\b" in boost but "c:b" according to the above definition. Why is it wrong? Thinking of paths rather than strings, when concatenating two path elements, why would there suddenly appear a third one? Furthermore, considering the current description of parent_path() which is formulated using the wrong operator /=, path("c:a/b").parent_path() should return "c:\a" according to the documentation. I guess (and hope) this is not what the implementation actually does. In fact, I believe that parent_path shall not exist. Instead there shall be pop_back() that does that in-place, and an iterator constructor that constructs a path "as if" by successively applying operator /= on each path element. Similarly "a" / "/b" gives "a/b" in boost but "/b" with the above definition. operator+= ---------------- This is a new surprise for me, as it wasn't there last time I expected the library, Is there any rationale for this path-string duality syndrome? And bravo for the inconsistency: we have operator += without a corresponding operator + (yes, I understand the the later is dangerous, but so is the first one). size() --------- The lack of it is not an issue. Path shall be a bidirectional range with value_type of "path_element". The computation of size would not be only inefficient, but also useless. What's the logic of counting the number of path elements? And in case you do want it in some esoteric case, you can use distance(begin, end). assign/append/constructor operations taking ranges ---------------------------- These should accept ranges of path_elements and work as-if by applying each element with /=. absolute(p, base) -------------------- The case when base has no root_directory makes no sense. Boost.FS gives absolute("a:x") == "a:\z\x" if current_path() == "c:\z", but the more logical thing is "a:\w\x" if the current directory of drive "a:" (environment variable "=A:") is "a:\w". Another example, using paths from real world: let p = "Australia, current city, King St., No. 10", base = "Tel-Aviv, Menachem Begin St., No. 7". boost::absolute(p,base) will bring you to: "Australia, Tel-Aviv, King St. No 10", while what you want is to go to Australia, ask yourself "what city I'm currently in?" and if you landed in Melbourne you'll get to "Australia, Melbourne, King St., No. 10". Expected definition: Returns system_complete(base/p). In particular, this comes up because I'm trying to find the greatest
common prefix of two paths. I thought this would be easy; I'd just use std::mismatch. But even once I've found the mismatch I don't see any obvious way to chop off the non-matching parts of one of the paths. [...]
Question: why you need to do this? (I do not mean that it is not needed, just curious.) Why should paths be so different from everything else? I think, if the
design is actually right, some rationale is sorely needed.
They shouldn't. I believe the design of Boost.FS is somewhat wrong, mixing different concepts together.
Also,
* (**) the docs don't say what the value_type of path::iterator is. A string value? A range that becomes invalid when the path is destroyed? Ah!?! How surprising; inspecting the code shows it iterates over paths! A container whose element type is itself is very unusual!
Agree. This bothers me too. IMO the value_type of paths and iterators shall be the same, and likely be either a string or a path_element class, which will provide appropriate interface. This includes observers: * to query the type of the element. It can be one of sub_directory, parent_directory or current_directory to avoid comparison with ".." and "." (on OpenVMS it uses other strings, right?) as well as some implementation specific values to query the values that were already known to the parser (for example: drive (both \\?\c: or c:) or a UNC (both \\?\UNC\x or \\x) on windows). * to query the stem and extension. Cheers, -- Yakov