Re: [boost] New design proposal for boost::filesystem

21 Aug 2004

      On Sat, Aug 21, 2004 at 11:36:31AM +0100, John Maddock wrote:
...
...
I propose the following design.  The aim of boost::filesystem should be to
support the following coding idiom:
* The programmer should take care to only handle two types of paths
  in his application:
1) Complete paths
  2) Relative paths
That is true now.
No, the current implementation/design doesn't restrict anything at all
in this regard.  You can just use fs::path to store paths - and that is
it.  A path is not aware of a distinct difference between complete
paths and relative paths (except that they are complete or relative
of course) - definitely not in the same way as a design would that
uses two different classes for the two.

The most problematic difference is that it is possible to even store
a third possibility (one that is neither complete nor relative).
This is what caused me to see that there is a design error in the first
place and caused me to think about improvements.
...
...
* The programmer will have to specifically tell the libary when a
constructed
  path is 'native' and when not.  A native path is accepted according to
native
  rules and never gives any problem (exceptions) further on.
  A non-native path is checked according to the existing rules, which
basically
  means that the programmer can set a default check routine that will in
effect
  determine how portable the application will be.
That is also true now, isn't it?
No, because the 'native' that you can specify with the current design
is only a check on the characters used in the directory components, and
only related to the representation of a path - not *marking* the path
as different.  The 'native' I am talking about is enforced for complete
paths and reflexs also the root part of a directory ("C:\", "c:/cywgin/",
"/" etc).  The essential difference is that a path that is once marked
as 'native' must stay native.  It can never be converted to a portable
path anymore.  Only relative paths that are portable from the start
can stay portable after certain operations (like appending a directory, or
cutting off a directory at the end).
...
...
I propose two design changes:
1) 'native' is now not only a representation, but an *internal state* of
fs::path.
   (this has no effect on the representation as returned by
fs::path::string()).
2) All 'complete' paths are automatically marked as 'native'.
If I understand you correctly, you are suggesting that error checking is
turned off for native paths, I would support that, but other than that I
don't see how it differs.
Heh - the current design does NOT have an internal state that marks
a path as native or not.  Hence, it cannot do sensible error checks
that need that knowledge.  Isn't that difference enough?
...
Actually, there is another case where error checking needs to be turned off:
when the path is obtained from a directory_iterator, but is none the less
relative (and *please* don't tell me that all such paths should be complete,
that would break a lot of code; actually it would make a lot of coding
idioms impossible).
A directory_iterator returns a single directory component no?  Apart from
that that is almost rather a string, it is at most a relative path: the
root part is not there.  I have absolutely no problem when the root part
of simply forgotten; it will be easy enough to add it back (and to detect
this error when one forgets it).

I can see a problem with iterating over a native_path though. I'd vouch
for only allowing to iterate over relative_path components.

For example:

[...] sorry, but at this point my eye fell on:
...
...
Examples, the following code is legal:
fs::path p1("C:\\foo\\a.exe", native); // As one might do on windows.
fs::path p2("/usr/src/a.out, native); // As one might do on linux.
std::cout << p1.native_file_string(); // ok, p1 is native.
fs::path p3("foo/bar"); // Relative path, always succeeds.
std::cout << p.string(); // ok
std::cout << p.native_directory_string(); // Useless, but ok.
Not useless at all.  And works now.
  ^^^^^^^^^^^^^^^^
    \__ this.
That is not backupped. 
We can't have a discussion along the lines of "yes" "no" "yes" "no".
Please try to be a bit more constructive.  *sets the example verbosity
a few notches back*.

[...] anyway... it is a problem thus when you allow to iterator over
a native_path and have that produce relative_path's.  I wouldn't allow that.
...
...
fs::path p4(complete(p3)); // p4 is now "native", because now it is
complete.
std::cout << p4.native_directory_string(); // ok
And the following will fail (assertion?):
std::cout << p4.string() // Not allowed because p4 is native (complete).
One could add an assertion that the path is not complete, if you want that
behaviour.
Yes.  Throwing an exception seem not correct.  An assertion makes most sense.
...
Actually this change would break the bcp utility - there is a (slightly
hairy) use for this.
Would you mind to defend that use?  I cannot think of a use that is in fact
not portable (in a dangerous way) and cannot be solved better.
...
...
Since there would be a default way defined of how a relative path is
completed, all operation functions will accept both, relative
and complete paths. For example:
fs::path p1("C:\\cygwin\\usr/bin/ls", native); // Legal path on Cygwin.
if (fs::exists(p1)) // Ok, access complete path.
// For clarification
fs::path p2("C:\\cygwin\\usr"); // Just an example
fs::default_working_directory(p2); // fictuous function.
fs::path p3("bin/ls"); // Portable representation (refers in fact to
"C:\cygwin\usr\bin\ls.exe").
if (fs::exists(p3)) // Ok : exists() will make the path complete before
testing.
And this throws:
fs::path p4("/bin/ls"); // Not allowed: this path has a root but is not
marked 'native'.
Setting the default_working_directory shall allways
need to be done for each supported OS seperately.
Of course you can set it to "/" on single root machines, and
set it to "E:/" after extracting the 'E:' from the current
path at application start up, in effect simulating a 'single root':
You should never need to set that explicitly unless you want to: each
aplication inherits a default working directory anyway from the host
environment.
Yes - it should be set by default to the working directory at application
start (first use of fs::path, or however it works now).  But no harm
done to allow it to be changed explicitely.  I am just saying that this
default working directory should NOT change automagically as function
of the _real_ working directory.  I think this is the case now too, no?
...
BTW the behaviour you're asking for was required by bcp - all the paths are
relative to some root (the boost installation path) - that path may be
relative or absolute; and whenever you need a path relative to some root,
one can just use:
my_root / my_relative_path
so again, you can do what you want right now.
I think I can implement my proposal on top of the current boost::filesystem yes.
But that doesn't mean that boost::filesystem is robust, or supports coding
in a portable way.
...
...
The average application will work with relative
paths, relative to some (native) base directory
and next to that have some arbitrary, complete and thus native
directories (ie, read from environment variables).
But in case more than one 'working' directory seems needed
then we can add support for that too by allowing to
construct paths with a reference to the (complete/native)
working directory. Ie,
fs::path homedir(g_getenv("HOME"), native);
  fs::path rcdir(homedir / "edragon/rc", native);
  fs::path tmpdir(current_path().root_path() / "tmp");
  // ...
fs::path runtime_rcfile(rcdir); // Set 'runtime_rcfile' to be relative
to 'rcdir'.
  // ...
runtime_rcfile = "config/runtimerc";
which is then relative to `rcdir' instead of
  a single, global 'working directory' (as now returned
  by fs::current_path()).
I'm sorry, but that looks way more complicated to me than the current
design:  if you want a path to be relative to a specific base, then use
"my_base/my_path", it's easy to use, works, and it's clear what you mean as
well.
Right now you can call all the operations that actually pass a path to some
system call - and UNLESS that path is complete - it will not be non-portable.
THAT means that boost::filesystem has a structural design error.  You are
now saying that I can complete all paths just prior to calling any operation
that passes paths to system calls.  Would you mind even listing those for me?
Why not change boost::filesystem so that it takes care of completion
automatically when needed?  That way you cannot make errors.

As an example - I developed my application first on GNU/Linux with as only
demand that it had to be portable to cygwin.  Therefore, I used UNIX paths.
Testing if /usr/src/edragon/edragon/build/src/gui/gui.la existed on linux
worked fine.  It did NOT work on cygwin.  Because I was stupid?  No, I think
that was because boost::filesystem is not activily supporting portable programming.

-- 
Carlo Wood <carlo@alinoe.com>