Re: [boost] New design proposal for boost::filesystem

Carlo Wood wrote:
On Sat, Aug 21, 2004 at 11:36:31AM +0100, John Maddock wrote:
I propose the following design. The aim of boost::filesystem should be to support the following coding idiom:
* The programmer should take care to only handle two types of paths in his application:
1) Complete paths 2) Relative paths
That is true now.
No, the current implementation/design doesn't restrict anything at all in this regard. You can just use fs::path to store paths - and that is it. A path is not aware of a distinct difference between complete paths and relative paths (except that they are complete or relative of course) - definitely not in the same way as a design would that uses two different classes for the two.
A complete path is one that is fully qualified, thus: fs::path absolute = fs::path( "d:/devel/libraries/boost/1.31.0/libs" ); whereas a relative path is one that is "relative" to another. With a relative path, you must have an absolute path as the basis, e.g.: fs::path relative = fs::path( "../boost/mpl" ); and can turn the relative path into an absolute one by doing: fs::path qualified = absolute / relative; so where is the problem?
The most problematic difference is that it is possible to even store a third possibility (one that is neither complete nor relative).
??? Do you mean one that isn't relative because it is not based from an absoute path, or that the path does not exist? If so, you can do that in any file system, e.g.: fs::path foobar = fs::path( "../booster/foobar.foo" ); fs::exists( absolute / foobar ); A path will either (in absolute form) refer to a file, a directory or be invalid.
* The programmer will have to specifically tell the libary when a constructed path is 'native' and when not. A native path is accepted according to native rules and never gives any problem (exceptions) further on. A non-native path is checked according to the existing rules, which basically means that the programmer can set a default check routine that will in effect determine how portable the application will be.
That is also true now, isn't it?
No, because the 'native' that you can specify with the current design is only a check on the characters used in the directory components, and only related to the representation of a path - not *marking* the path as different. The 'native' I am talking about is enforced for complete paths and reflexs also the root part of a directory ("C:\", "c:/cywgin/", "/" etc). The essential difference is that a path that is once marked as 'native' must stay native. It can never be converted to a portable path anymore. Only relative paths that are portable from the start can stay portable after certain operations (like appending a directory, or cutting off a directory at the end).
The native you are referring to is an "absolute" path, correct? Boost.Filesystem stores the paths in a generic form. Thus, if your filesystem uses ':' as a directory separator, the library will map this correctly from it's internal state. If you enforce that all "native" Windows paths must have a directory, you are restricting the use of the library, e.g. "\\mycomp\devel\libraries" is not possible! Likewise, it would not be possible to store URLs. NOTE: I am not sure if Boost.Filesystem currently supports URLs, but it should be possible to extend it to add that support. Also, URLs are not attached to a particular filesystem and are thus portable. It should be possible to handle both URLs and native paths, but limiting "native" paths will only make this harder.
I propose two design changes:
1) 'native' is now not only a representation, but an *internal state* of fs::path. (this has no effect on the representation as returned by fs::path::string()).
This would make things too restrictive. I think you are confusing "native" with "absolute" paths, but how do you validate an absolute path? Is there a Windows and a POSIX function to validate an absolute path (e.g. IsPathAbsoute)? It would then be possible to have a fs::is_absolute( ... ) function.
2) All 'complete' paths are automatically marked as 'native'.
Likewise, how do you validate a complete path? What about a URL? If you have the URL "http://www.boost.org/people" and are using a system whereby the native directory separator is ':', the URL will be mapped to "http:::www.boost.org:people", corrupting the URL.
If I understand you correctly, you are suggesting that error checking is turned off for native paths, I would support that, but other than that I don't see how it differs.
Heh - the current design does NOT have an internal state that marks a path as native or not. Hence, it cannot do sensible error checks that need that knowledge. Isn't that difference enough?
Native - as Boost.FileSystem uses it - is used to mark a path as using the native OS syntax for specifying paths. This is so you can do: fs::system_complete( fs::path( argv[1], fs::native ) ); and use the library on Windows, *nix, OpenVMS, MacOS, etc. without having to worry about the differences between how they specify their paths and thus what regular expressions need to be used on the string. The example from the documentation (using OpenVMS) is: DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5
Examples, the following code is legal:
fs::path p1("C:\\foo\\a.exe", native); // As one might do on windows. fs::path p2("/usr/src/a.out, native); // As one might do on linux. std::cout << p1.native_file_string(); // ok, p1 is native.
fs::path p3("foo/bar"); // Relative path, always succeeds. std::cout << p.string(); // ok std::cout << p.native_directory_string(); // Useless, but ok.
Not useless at all. And works now. ^^^^^^^^^^^^^^^^ \__ this.
That is not backupped.
Did you mean p to be the relative path p3? If so, then p3.native_directory_string() is perfectly valid and *not* useless. If the OS uses ':' as it's directory separator, then this will return "foo:bar". It is useful for all sorts of purposes, like debuging.
fs::path p4(complete(p3)); // p4 is now "native", because now it is complete.
Why does native ==> complete? You can have a native relative and/or complete path.
std::cout << p4.string() // Not allowed because p4 is native (complete).
Huh? IIUC path.string() returns the internal format of the path (e.g. "foo/bar") whereas path.native_directory_string() returns the path as represented by the native OS (e.g. "foo:bar").
Right now you can call all the operations that actually pass a path to some system call - and UNLESS that path is complete - it will not be non-portable.
Not true. "foo/bar" is relative *and* non-portable for OS's that use ':' as a directory separator.
THAT means that boost::filesystem has a structural design error. You are now saying that I can complete all paths just prior to calling any operation that passes paths to system calls. Would you mind even listing those for me? Why not change boost::filesystem so that it takes care of completion automatically when needed? That way you cannot make errors.
If you have an absolute path boost_dir and a relative path boost_build2, you can do: ::SetWorkingDirectory(( boost_dir / boost_build2 ).native_directory_string().c_str());
As an example - I developed my application first on GNU/Linux with as only demand that it had to be portable to cygwin. Therefore, I used UNIX paths. Testing if /usr/src/edragon/edragon/build/src/gui/gui.la existed on linux worked fine. It did NOT work on cygwin. Because I was stupid? No, I think that was because boost::filesystem is not activily supporting portable programming.
Did you check that you have the file where you said on the cygwin system. Have you checked that "/usr/bin/ls.exe" exists? Also, have you checked that BOOST_POSIX is defined as per the note on using the cygwin platform in the docs? Regards, Reece _________________________________________________________________ Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo

On Sun, Aug 22, 2004 at 09:14:34AM +0100, Reece Dunn wrote:
A complete path is one that is fully qualified, thus:
fs::path absolute = fs::path( "d:/devel/libraries/boost/1.31.0/libs" );
whereas a relative path is one that is "relative" to another. With a relative path, you must have an absolute path as the basis, e.g.:
fs::path relative = fs::path( "../boost/mpl" );
and can turn the relative path into an absolute one by doing:
fs::path qualified = absolute / relative;
so where is the problem?
The problem is not with relative paths, I think that boost::filesystem currently supports relative paths rather well - provided that you only use relative paths ;). The problem is, that boost::filesystem does not help the developer to write a portable program by warning him on every OS he might develop on that 'absolute' is to be treated as a native path. The pit one can fall into is therefore that you can use a value (cause fs::path merely stores values - it lacks the notition of relative and complete paths to be essential different) for 'absolute' that stores a path that is complete on one OS but is not on another. For example, // On GNU/Linux, a developer might do: fs::path absolute = fs::path("/devel/libraries/boost/1.31.0/libs"); fs::path relative = fs::path( "../boost/mpl" ); fs::path qualified = absolute / relative; if (fs::exists(qualified)) { // We get here on linux. Without any warning. Obviously, this code won't work on windows, and it won't work on cygwin when compiled with BOOST_WINDOWS. Now, 'dont work' should be that it throws, or that an assertion fails - but no, it simply says that `qualified' doesn't exist. I agree that I have a very hard time to convince anyone that this is 'flawed' ;). It is so easy to say that the program example is simply wrong ;). Nevertheless, I think that uses two different types for the two types of paths will greatly improve robustness of the code. See below - where I'll give the same code using my proposed API.
The most problematic difference is that it is possible to even store a third possibility (one that is neither complete nor relative).
??? Do you mean one that isn't relative because it is not based from an absoute path, or that the path does not exist?
No, I mean with a fs::path ph, ph.is_complete() returns falls and ph.has_root_directory() returns true :). For example the "/devel/libraries/boost/1.31.0/libs" used above is complete on linux, but is not complete on windows on cywgin(!) (compiled with BOOST_WINDOWS), and of course it also isn't relative.
If so, you can do that in any file system, e.g.:
fs::path foobar = fs::path( "../booster/foobar.foo" ); fs::exists( absolute / foobar );
A path will either (in absolute form) refer to a file, a directory or be invalid.
Here you assume that 'absolute' has the correct value thus. My point is that this value has to be totally different on each supported OS. That fact is not clear enough with the current implementation. A possibility is to demand that 'absolute' must be of a type fs::native_path. On top of that, we can allow to assign an 'absolute' field to a (normal, relative) fs::path (or even call that fs::relative_path). In that case you take care of the needed completion inside fs::exists. Consider this code: fs::path foobar; if (something) foobar = fs::path( "../booster/foobar.foo" ); else foobar = fs::path( "/devel/libraries/boost/1.31.0/booster/foobar.foo" ); which happens 'somewhere' in the code, outside of the view of the code that developer is currently written. Then basically the developer doesn't know if the path is relative or absolute and the 'absolute / foobar' simply doesn't work. Yet, the current boost::filesystem implementation allows it without problems. I think that this is wrong: it should not be possible for one variable to have OR a relative value OR an absolute value - the two are essential different. Ok, so - with the current implementation of boost::filesystem - one might do: fs::path qualified = fs::complete(foobar, absolute); fs::exists(qualified); because supposedly this would leave foobar alone when it is already complete and prepend 'absolute' when it is not... But no - this will fail. Because also "/devel/libraries/boost/1.31.0/booster/foobar.foo" is (for example - but even one example should be enough to show that there is a design problem, I hope), this will fail on cygwin. In that case foobar isn't complete, but is DOES have root directory and therefore absolute is rightfully ignored, but no root base is prepended. The path "/devel/libraries/boost/1.31.0/booster/foobar.foo" SHOULD have been expanded to "c:/cygwin/devel/libraries/boost/1.31.0/booster/foobar.foo" (when boost was compiled with BOOST_WINDOWS and cygwin was installed in C:\cygwin\).
The native you are referring to is an "absolute" path, correct?
With native I mean intrinsical "not portable". Currently that is only the case for a path with weird characters, or for an explicit windows path with a ':' in it for example. Imho, every "complete" path is native in the regard that it is not portable by definition: every significantly different OS uses a different way to specify its root part.
Boost.Filesystem stores the paths in a generic form.
Hardly - after doing some check - it just stores the paths almost literally as a string in memory. At most it converts a backslash into a slash. Certainly the ':' of a root part in windows (like "C:\") is stil stored as "C:/" internally. The information that this makes that object 'native' (or 'not portable') is lost.
Thus, if your filesystem uses ':' as a directory separator, the library will map this correctly from it's internal state. If you enforce that all "native" Windows paths must have a directory,
You are turning things around. I said I wanted to introduce a new state, or even type, that stores the fact that a path is 'native'. And, I said that every complete path should be marked with this new notion. Now you say that the other way around: every native path must 'have a directory' (== complete? (cause "/foo" has a directory, but is not complete on cygwin, currently)).
I propose two design changes:
1) 'native' is now not only a representation, but an *internal state* of fs::path. (this has no effect on the representation as returned by fs::path::string()).
This would make things too restrictive.
How can *adding* a flag to the class be too restrictive? So far, the above doesn't say that anything else has to change. It could be that I want to propose a 'bool is_native()' at most. In that case nothing would be restricted, or changed when it comes to the current API part.
I think you are confusing "native" with "absolute" paths, but how do you validate an absolute path?
Heh - no, I am not "confusing" the two! I am very well aware of what I am saying when I propose to ADD a notion of "nativeness" that is ENFORCED for "absolute" paths (complete paths that is). This notion is needed... see all the examples above. I don't think I can give more examples without it becoming pointless.
Is there a Windows and a POSIX function to validate an absolute path (e.g. IsPathAbsoute)? It would then be possible to have a fs::is_absolute( ... ) function.
boost::fs already defines that: fs::path::is_complete
2) All 'complete' paths are automatically marked as 'native'.
Likewise, how do you validate a complete path? What about a URL? If you have the URL "http://www.boost.org/people" and are using a system whereby the native directory separator is ':', the URL will be mapped to "http:::www.boost.org:people", corrupting the URL.
An url is not a FILESYSTEM path. The current implementation of boost::fs also doesn't support urls in anyway (it talks about 'has_root_name', 'has_root_directory' etc.). The fact that you can store url's in a fs::path with the current implementation shows how little it is aware of what it is doing and therefore how little it protects you as developer again abusing it. In order to support urls, you need yet another type.. fs::url_path. Then, of course, it could be allowed to use an fs::url_path as 'root' for a relative path. Now you will say: but then I can't write a generic piece of code that accepts either files or http:// urls. So, ok - you can make a fs::native_path accept fs::url_path's too: fs::native_path my_working_directory = ... // Some local directory. fs::url_path url("http://www.boost.org/people/"); // URL, not a relative path in a directory 'http:', which would be legal on UNIX fs::relative_path rel(my_working_directory); // Make 'rel' relative to my_working_directory. rel = "carlo/design.html"; fs::native_path result; if (here) result = rel; else result = url / rel; Imho, code like this is type-safe. It makes it clear what you are doing and protects one against stupid abuse. Note how the root-handling of different supported OS is *completely* hidden by 'my_working_directory'. It is no longer the concern of the developer.
If I understand you correctly, you are suggesting that error checking is turned off for native paths, I would support that, but other than that I don't see how it differs.
Heh - the current design does NOT have an internal state that marks a path as native or not. Hence, it cannot do sensible error checks that need that knowledge. Isn't that difference enough?
Native - as Boost.FileSystem uses it - is used to mark a path as using the native OS syntax for specifying paths. This is so you can do:
fs::system_complete( fs::path( argv[1], fs::native ) );
I know that.
and use the library on Windows, *nix, OpenVMS, MacOS, etc. without having to worry about the differences between how they specify their paths and thus what regular expressions need to be used on the string. The example
while importing/constructing the path. And immdeately after that it 'forgets' that this path has a native value. [..rip.. (this is getting to long already... has taken me a full hour to write this) ] -- Carlo Wood <carlo@alinoe.com>
participants (2)
-
Carlo Wood
-
Reece Dunn