New design proposal for boost::filesystem

older
Re: [boost] numeric-ublas errors...

Carlo Wood

20 Aug 2004 20 Aug '04

7:01 p.m.

I propose the following design. The aim of boost::filesystem should be to support the following coding idiom: * The programmer should take care to only handle two types of paths in his application: 1) Complete paths 2) Relative paths * The programmer will have to specifically tell the libary when a constructed path is 'native' and when not. A native path is accepted according to native rules and never gives any problem (exceptions) further on. A non-native path is checked according to the existing rules, which basically means that the programmer can set a default check routine that will in effect determine how portable the application will be. I propose two design changes: 1) 'native' is now not only a representation, but an *internal state* of fs::path. (this has no effect on the representation as returned by fs::path::string()). 2) All 'complete' paths are automatically marked as 'native'. Examples, the following code is legal: fs::path p1("C:\\foo\\a.exe", native); // As one might do on windows. fs::path p2("/usr/src/a.out, native); // As one might do on linux. std::cout << p1.native_file_string(); // ok, p1 is native. fs::path p3("foo/bar"); // Relative path, always succeeds. std::cout << p.string(); // ok std::cout << p.native_directory_string(); // Useless, but ok. fs::path p4(complete(p3)); // p4 is now "native", because now it is complete. std::cout << p4.native_directory_string(); // ok And the following will fail (assertion?): std::cout << p4.string() // Not allowed because p4 is native (complete). Since there would be a default way defined of how a relative path is completed, all operation functions will accept both, relative and complete paths. For example: fs::path p1("C:\\cygwin\\usr/bin/ls", native); // Legal path on Cygwin. if (fs::exists(p1)) // Ok, access complete path. // For clarification fs::path p2("C:\\cygwin\\usr"); // Just an example fs::default_working_directory(p2); // fictuous function. fs::path p3("bin/ls"); // Portable representation (refers in fact to "C:\cygwin\usr\bin\ls.exe"). if (fs::exists(p3)) // Ok : exists() will make the path complete before testing. And this throws: fs::path p4("/bin/ls"); // Not allowed: this path has a root but is not marked 'native'. Setting the default_working_directory shall allways need to be done for each supported OS seperately. Of course you can set it to "/" on single root machines, and set it to "E:/" after extracting the 'E:' from the current path at application start up, in effect simulating a 'single root': int main() { fs::default_working_directory(fs::current_path().root_path()); // ... simulate single root machine below. fs::path p("bin/ls"); } The average application will work with relative paths, relative to some (native) base directory and next to that have some arbitrary, complete and thus native directories (ie, read from environment variables). But in case more than one 'working' directory seems needed then we can add support for that too by allowing to construct paths with a reference to the (complete/native) working directory. Ie, fs::path homedir(g_getenv("HOME"), native); fs::path rcdir(homedir / "edragon/rc", native); fs::path tmpdir(current_path().root_path() / "tmp"); // ... fs::path runtime_rcfile(rcdir); // Set 'runtime_rcfile' to be relative to 'rcdir'. // ... runtime_rcfile = "config/runtimerc"; which is then relative to `rcdir' instead of a single, global 'working directory' (as now returned by fs::current_path()). Most of the current boost::filesystem API can be preserved with this design. -- Carlo Wood <carlo@alinoe.com>

Show replies by date

Keith Burton

21 Aug 21 Aug

7:30 a.m.

...

And the following will fail (assertion?): std::cout << p4.string() // Not allowed because p4 is native (complete).

Why should this fail ? It seems an unnecessary restriction Keith -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Carlo Wood Sent: 20 August 2004 20:01 To: boost@lists.boost.org Subject: [boost] New design proposal for boost::filesystem I propose the following design. The aim of boost::filesystem should be to support the following coding idiom: [snip]

Carlo Wood

8:26 a.m.

On Sat, Aug 21, 2004 at 08:30:43AM +0100, Keith Burton wrote:

...

...
And the following will fail (assertion?): std::cout << p4.string() // Not allowed because p4 is native (complete).

Why should this fail ? It seems an unnecessary restriction

Perhaps more extension of the API is needed. There is a mixup up of two different types of 'native'. 'native' means "works only on the OS that the application is currently running on". But still, there two distinct ways that a path can fail to be portable: 1) It can contain characters that are not portable, or use a format that is not portable. 2) The root base (an explicit, complete, directory) is well, explicit. The internal 'native' state that I introduced marks the latter, assuming that any complete path is automatically not portable because the 'root' part of any path is simply not portable. But, as I said before, this 'native' wouldn't change the format of the string returned by 'string()'. And now, as you noted, I suddenly propose to also trip on the *usage* of 'string()' when this 'native' flags (with the second meaning) is set! I agree that this is a duality and some refinement might be better. However, just plain allowing it seems wrong. If a programmer uses a complete path, then it is per definition not portable (point 2) and therefore marked internally as a 'native' path of the second kind. Then when a programmer converts the path to a std::string, he should show somehow that he is aware of the fact that this is not portable. The idea was to reserve path::string() for portable strings, that can be used regardless of the OS. But well, you are right if you say that it might be needed, because there is also this thing 'representation'. A programmer might need the 'canonical' representation of a path - having nothing else then that path. The current boost::filesystem just allows one to 'store' all kind of paths that are out there, and gards point 1), only. If a path contains a "c:/Program Files/My Documents", then string() will return that, and native_directory_string() will return "C:\Program Files\My Documents". I can imagine that that is not wanted in some cases. The most clean, logical, way to improve this would be to have a way to convert complete paths back to relative paths, and then use string(). But there are inprinciple many relative paths (one for each directory it contains) so I don't like the conversion (removing an explicit root base should be possible though). A straight forward shortcircuit of this problem would be to introduce the same function but with a different name. Like fs::path::complete_string, which would be allowed on complete paths and return then what string() returns now. That brings me to a new idea: why not have two different _types_? fs::relative_path and fs::native_path. Then, a program that uses code that you considered problematic turns into something like this: #ifdef __CYGWIN__ fs::native_path rootbase(get_cygwin_root()); #else // ... #endif fs::relative_path tmpdir(rootbase); tmpdir = "tmp"; fs::relative_path session_socket(tmpdir); session_socket = fs::relative_path("screen") / username() / "s0"; std::cout << session_socket.string() << std::endl; // Prints "screen/carlo/s0". std::cout << session_socket.native_file_string() << std::endl; // Prints "[screen/carlo]:s0" on VMS. fs::native_path completed_session_socket(session_socket); std::cout << completed_session_socket.string() << std::endl; // Prints "c:/cygwin/tmp/screen/carlo/s0" on cygwin. std::cout << completed_session_socket.native_directory_string() << std::endl; // Prints "C:\cygwin\tmp\screen\carlo\s0" on cygwin. In this senario I do not have a problem with allowing 'string()' to be called, because having a separate type for complete path's is enough clarity for me. Note that in this case: fs::relative_path foo; // Uses fs::current_path() as root base. fs::complete(foo) // returns fs::current_path() / foo. fs::complete(foo, tmp) // returns tmp / foo. The first parameter of fs::complete must be a relative_path and the second a native_path.

John Maddock

10:36 a.m.

...

I propose the following design. The aim of boost::filesystem should be to support the following coding idiom:

* The programmer should take care to only handle two types of paths in his application:

1) Complete paths 2) Relative paths

That is true now.

...

* The programmer will have to specifically tell the libary when a constructed path is 'native' and when not. A native path is accepted according to native rules and never gives any problem (exceptions) further on. A non-native path is checked according to the existing rules, which basically means that the programmer can set a default check routine that will in effect determine how portable the application will be.

That is also true now, isn't it?

...

I propose two design changes:

1) 'native' is now not only a representation, but an *internal state* of fs::path. (this has no effect on the representation as returned by fs::path::string()). 2) All 'complete' paths are automatically marked as 'native'.

If I understand you correctly, you are suggesting that error checking is turned off for native paths, I would support that, but other than that I don't see how it differs. Actually, there is another case where error checking needs to be turned off: when the path is obtained from a directory_iterator, but is none the less relative (and *please* don't tell me that all such paths should be complete, that would break a lot of code; actually it would make a lot of coding idioms impossible).

...

Examples, the following code is legal:

fs::path p1("C:\\foo\\a.exe", native); // As one might do on windows. fs::path p2("/usr/src/a.out, native); // As one might do on linux. std::cout << p1.native_file_string(); // ok, p1 is native.

fs::path p3("foo/bar"); // Relative path, always succeeds. std::cout << p.string(); // ok std::cout << p.native_directory_string(); // Useless, but ok.

Not useless at all. And works now.

...

fs::path p4(complete(p3)); // p4 is now "native", because now it is complete. std::cout << p4.native_directory_string(); // ok

And the following will fail (assertion?):

std::cout << p4.string() // Not allowed because p4 is native (complete).

One could add an assertion that the path is not complete, if you want that behaviour. Actually this change would break the bcp utility - there is a (slightly hairy) use for this.

...

Since there would be a default way defined of how a relative path is completed, all operation functions will accept both, relative and complete paths. For example:

fs::path p1("C:\\cygwin\\usr/bin/ls", native); // Legal path on Cygwin. if (fs::exists(p1)) // Ok, access complete path.

// For clarification fs::path p2("C:\\cygwin\\usr"); // Just an example fs::default_working_directory(p2); // fictuous function.

fs::path p3("bin/ls"); // Portable representation (refers in fact to "C:\cygwin\usr\bin\ls.exe"). if (fs::exists(p3)) // Ok : exists() will make the path complete before testing.

And this throws:

fs::path p4("/bin/ls"); // Not allowed: this path has a root but is not marked 'native'.

Setting the default_working_directory shall allways need to be done for each supported OS seperately. Of course you can set it to "/" on single root machines, and set it to "E:/" after extracting the 'E:' from the current path at application start up, in effect simulating a 'single root':

You should never need to set that explicitly unless you want to: each aplication inherits a default working directory anyway from the host environment. BTW the behaviour you're asking for was required by bcp - all the paths are relative to some root (the boost installation path) - that path may be relative or absolute; and whenever you need a path relative to some root, one can just use: my_root / my_relative_path so again, you can do what you want right now.

...

The average application will work with relative paths, relative to some (native) base directory and next to that have some arbitrary, complete and thus native directories (ie, read from environment variables).

But in case more than one 'working' directory seems needed then we can add support for that too by allowing to construct paths with a reference to the (complete/native) working directory. Ie,

fs::path homedir(g_getenv("HOME"), native); fs::path rcdir(homedir / "edragon/rc", native); fs::path tmpdir(current_path().root_path() / "tmp"); // ...

fs::path runtime_rcfile(rcdir); // Set 'runtime_rcfile' to be relative to 'rcdir'. // ...

runtime_rcfile = "config/runtimerc";

which is then relative to `rcdir' instead of a single, global 'working directory' (as now returned by fs::current_path()).

I'm sorry, but that looks way more complicated to me than the current design: if you want a path to be relative to a specific base, then use "my_base/my_path", it's easy to use, works, and it's clear what you mean as well. John.

Carlo Wood

8:52 p.m.

On Sat, Aug 21, 2004 at 11:36:31AM +0100, John Maddock wrote:

...

...
I propose the following design. The aim of boost::filesystem should be to support the following coding idiom:

* The programmer should take care to only handle two types of paths in his application:

1) Complete paths 2) Relative paths

That is true now.

No, the current implementation/design doesn't restrict anything at all in this regard. You can just use fs::path to store paths - and that is it. A path is not aware of a distinct difference between complete paths and relative paths (except that they are complete or relative of course) - definitely not in the same way as a design would that uses two different classes for the two. The most problematic difference is that it is possible to even store a third possibility (one that is neither complete nor relative). This is what caused me to see that there is a design error in the first place and caused me to think about improvements.

...

...
* The programmer will have to specifically tell the libary when a constructed path is 'native' and when not. A native path is accepted according to native rules and never gives any problem (exceptions) further on. A non-native path is checked according to the existing rules, which basically means that the programmer can set a default check routine that will in effect determine how portable the application will be.

That is also true now, isn't it?

No, because the 'native' that you can specify with the current design is only a check on the characters used in the directory components, and only related to the representation of a path - not *marking* the path as different. The 'native' I am talking about is enforced for complete paths and reflexs also the root part of a directory ("C:\", "c:/cywgin/", "/" etc). The essential difference is that a path that is once marked as 'native' must stay native. It can never be converted to a portable path anymore. Only relative paths that are portable from the start can stay portable after certain operations (like appending a directory, or cutting off a directory at the end).

...

...
I propose two design changes:

1) 'native' is now not only a representation, but an *internal state* of fs::path. (this has no effect on the representation as returned by fs::path::string()). 2) All 'complete' paths are automatically marked as 'native'.

If I understand you correctly, you are suggesting that error checking is turned off for native paths, I would support that, but other than that I don't see how it differs.

Heh - the current design does NOT have an internal state that marks a path as native or not. Hence, it cannot do sensible error checks that need that knowledge. Isn't that difference enough?

...

Actually, there is another case where error checking needs to be turned off: when the path is obtained from a directory_iterator, but is none the less relative (and *please* don't tell me that all such paths should be complete, that would break a lot of code; actually it would make a lot of coding idioms impossible).

A directory_iterator returns a single directory component no? Apart from that that is almost rather a string, it is at most a relative path: the root part is not there. I have absolutely no problem when the root part of simply forgotten; it will be easy enough to add it back (and to detect this error when one forgets it). I can see a problem with iterating over a native_path though. I'd vouch for only allowing to iterate over relative_path components. For example: [...] sorry, but at this point my eye fell on:

...

...
Examples, the following code is legal:

fs::path p1("C:\\foo\\a.exe", native); // As one might do on windows. fs::path p2("/usr/src/a.out, native); // As one might do on linux. std::cout << p1.native_file_string(); // ok, p1 is native.

fs::path p3("foo/bar"); // Relative path, always succeeds. std::cout << p.string(); // ok std::cout << p.native_directory_string(); // Useless, but ok.

Not useless at all. And works now. ^^^^^^^^^^^^^^^^ \__ this.

That is not backupped. We can't have a discussion along the lines of "yes" "no" "yes" "no". Please try to be a bit more constructive. *sets the example verbosity a few notches back*. [...] anyway... it is a problem thus when you allow to iterator over a native_path and have that produce relative_path's. I wouldn't allow that.

...

...
fs::path p4(complete(p3)); // p4 is now "native", because now it is complete. std::cout << p4.native_directory_string(); // ok

And the following will fail (assertion?):

std::cout << p4.string() // Not allowed because p4 is native (complete).

One could add an assertion that the path is not complete, if you want that behaviour.

Yes. Throwing an exception seem not correct. An assertion makes most sense.

...

Actually this change would break the bcp utility - there is a (slightly hairy) use for this.

Would you mind to defend that use? I cannot think of a use that is in fact not portable (in a dangerous way) and cannot be solved better.

...

...
Since there would be a default way defined of how a relative path is completed, all operation functions will accept both, relative and complete paths. For example:

fs::path p1("C:\\cygwin\\usr/bin/ls", native); // Legal path on Cygwin. if (fs::exists(p1)) // Ok, access complete path.

// For clarification fs::path p2("C:\\cygwin\\usr"); // Just an example fs::default_working_directory(p2); // fictuous function.

fs::path p3("bin/ls"); // Portable representation (refers in fact to "C:\cygwin\usr\bin\ls.exe"). if (fs::exists(p3)) // Ok : exists() will make the path complete before testing.

And this throws:

fs::path p4("/bin/ls"); // Not allowed: this path has a root but is not marked 'native'.

Setting the default_working_directory shall allways need to be done for each supported OS seperately. Of course you can set it to "/" on single root machines, and set it to "E:/" after extracting the 'E:' from the current path at application start up, in effect simulating a 'single root':

You should never need to set that explicitly unless you want to: each aplication inherits a default working directory anyway from the host environment.

Yes - it should be set by default to the working directory at application start (first use of fs::path, or however it works now). But no harm done to allow it to be changed explicitely. I am just saying that this default working directory should NOT change automagically as function of the _real_ working directory. I think this is the case now too, no?

...

BTW the behaviour you're asking for was required by bcp - all the paths are relative to some root (the boost installation path) - that path may be relative or absolute; and whenever you need a path relative to some root, one can just use:

my_root / my_relative_path

so again, you can do what you want right now.

I think I can implement my proposal on top of the current boost::filesystem yes. But that doesn't mean that boost::filesystem is robust, or supports coding in a portable way.

...

...
The average application will work with relative paths, relative to some (native) base directory and next to that have some arbitrary, complete and thus native directories (ie, read from environment variables).

But in case more than one 'working' directory seems needed then we can add support for that too by allowing to construct paths with a reference to the (complete/native) working directory. Ie,

fs::path homedir(g_getenv("HOME"), native); fs::path rcdir(homedir / "edragon/rc", native); fs::path tmpdir(current_path().root_path() / "tmp"); // ...

fs::path runtime_rcfile(rcdir); // Set 'runtime_rcfile' to be relative to 'rcdir'. // ...

runtime_rcfile = "config/runtimerc";

which is then relative to `rcdir' instead of a single, global 'working directory' (as now returned by fs::current_path()).

I'm sorry, but that looks way more complicated to me than the current design: if you want a path to be relative to a specific base, then use "my_base/my_path", it's easy to use, works, and it's clear what you mean as well.

Right now you can call all the operations that actually pass a path to some system call - and UNLESS that path is complete - it will not be non-portable. THAT means that boost::filesystem has a structural design error. You are now saying that I can complete all paths just prior to calling any operation that passes paths to system calls. Would you mind even listing those for me? Why not change boost::filesystem so that it takes care of completion automatically when needed? That way you cannot make errors. As an example - I developed my application first on GNU/Linux with as only demand that it had to be portable to cygwin. Therefore, I used UNIX paths. Testing if /usr/src/edragon/edragon/build/src/gui/gui.la existed on linux worked fine. It did NOT work on cygwin. Because I was stupid? No, I think that was because boost::filesystem is not activily supporting portable programming. -- Carlo Wood <carlo@alinoe.com>

David Abrahams

11:39 a.m.

Carlo Wood <carlo@alinoe.com> writes:

...

I propose the following design. The aim of boost::filesystem should be to support the following coding idiom:

* The programmer should take care to only handle two types of paths in his application:

1) Complete paths 2) Relative paths

* The programmer will have to specifically tell the libary when a constructed path is 'native' and when not. A native path is accepted according to native rules and never gives any problem (exceptions) further on. A non-native path is checked according to the existing rules, which basically means that the programmer can set a default check routine that will in effect determine how portable the application will be.

I propose two design changes:

When proposing design changes, first you have to describe the problems your changes are designed to solve, or nobody will understand the motivation. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Carlo Wood

8:13 p.m.

On Sat, Aug 21, 2004 at 07:39:43AM -0400, David Abrahams wrote:

...

When proposing design changes, first you have to describe the problems your changes are designed to solve, or nobody will understand the motivation.

If you are not aware of the problems with boost::filesystem then I just give up immdeately, instead of wasting my time by trying to convince you about the need for a redesign. I've seen several people already agreeing that there is a design problem with the current boost::filesystem. -- Carlo Wood <carlo@alinoe.com>

Jonathan Turkanis

8:41 p.m.

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040821201331.GA23261@alinoe.com...

...

On Sat, Aug 21, 2004 at 07:39:43AM -0400, David Abrahams wrote:

...
When proposing design changes, first you have to describe the problems your changes are designed to solve, or nobody will understand the motivation.

If you are not aware of the problems with boost::filesystem then I just give up immdeately, instead of wasting my time by trying to convince you about the need for a redesign. I've seen several people already agreeing that there is a design problem with the current boost::filesystem.

This is a good way to have your future suggestions ignored. Since filesystem already has many users, you'll need to convince more than 'several people' that there is a problem before your proposal is accepted. Furthermore, how do you know that those people see the same problems as you do, if you're not willing to spell out in detail exactly what the problems are? Jonathan

Carlo Wood

8:54 p.m.

On Sat, Aug 21, 2004 at 02:41:00PM -0600, Jonathan Turkanis wrote:

...

This is a good way to have your future suggestions ignored.

Since filesystem already has many users, you'll need to convince more than 'several people' that there is a problem before your proposal is accepted. Furthermore, how do you know that those people see the same problems as you do, if you're not willing to spell out in detail exactly what the problems are?

Jonathan

Ok, then I won't waste any more of your or my time. Its fine with me that I just write this code for my own use. -- Carlo Wood <carlo@alinoe.com>

David Abrahams

10:36 p.m.

Carlo Wood <carlo@alinoe.com> writes:

...

On Sat, Aug 21, 2004 at 02:41:00PM -0600, Jonathan Turkanis wrote:

...
This is a good way to have your future suggestions ignored.

Since filesystem already has many users, you'll need to convince more than 'several people' that there is a problem before your proposal is accepted. Furthermore, how do you know that those people see the same problems as you do, if you're not willing to spell out in detail exactly what the problems are?

Jonathan

Ok, then I won't waste any more of your or my time. Its fine with me that I just write this code for my own use.

What a pity. Rationale is an important part of the Boost process. Supplying rationale for changes to a library, it seems to me, is not only reasonable but neccessary. If you ever change your mind and decide you can tolerate a request for rationale without storming off, please post it to this list, because I for one am certainly interested in possible changes to filesystem. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Jody Hagins

11:50 p.m.

On Sat, 21 Aug 2004 18:36:58 -0400 David Abrahams <dave@boost-consulting.com> wrote:

...

Rationale is an important part of the Boost process. Supplying rationale for changes to a library, it seems to me, is not only reasonable but neccessary. If you ever change your mind and decide you can tolerate a request for rationale without storming off, please post it to this list, because I for one am certainly interested in possible changes to filesystem.

Requiring rationale is not just a boost-ism. All professional development in which I have ever been involved has required rationale before initial development, and especially changing interfaces and fundamental designs after those interfaces have been placed into use. Asking for it is not a slam, but merely a request to provide supporting documentation.

David Abrahams

10:33 p.m.

Carlo Wood <carlo@alinoe.com> writes:

...

On Sat, Aug 21, 2004 at 07:39:43AM -0400, David Abrahams wrote:

...
When proposing design changes, first you have to describe the problems your changes are designed to solve, or nobody will understand the motivation.

If you are not aware of the problems with boost::filesystem

I am aware of some problems I had, but I'm not sure those are the same ones you're trying to solve.

...

then I just give up immdeately, instead of wasting my time by trying to convince you about the need for a redesign.

Easy, big fella! I'm long since convinced that something should be changed, but until I know what you're trying to address with your changes I'll have a hard time deciding whether the problems are important and whether your proposal effectively addresses them.

...

I've seen several people already agreeing that there is a design problem with the current boost::filesystem.

I'm among them. However, I have a hard time articulating what I found difficult about the library, other than that the path portability checks seemed to get in my way. If there's something deeper going on, I'd like to hear your opinion of what it is. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Carlo Wood

11:27 p.m.

Let me start with an appology for the rather unfriendly tone of my previous message. Fortunately people on IRC pointed this out to me; I didn't intend the post to be the way they read it, but I was wrong nevertheless. The point is that I just don't have time for (nor I am interested in) long discussions about design issues, and I think that actually I shouldn't even have started this thread ;). Due to experiences in the past, on other mailinglists mostly, I have gotten over sensitive to remarks like a short "no" and "that doesn't work" and the likes - without back-ups, examples or constructive remarks. I think I was in error to see Davids request in this way (as if he was making my life unnecessary hard), while in fact he was just asking for well - examples :). On Sat, Aug 21, 2004 at 06:33:10PM -0400, David Abrahams wrote:

...

Carlo Wood <carlo@alinoe.com> writes:

...
On Sat, Aug 21, 2004 at 07:39:43AM -0400, David Abrahams wrote:

...
When proposing design changes, first you have to describe the problems your changes are designed to solve, or nobody will understand the motivation.

If you are not aware of the problems with boost::filesystem

I am aware of some problems I had, but I'm not sure those are the same ones you're trying to solve.

...
then I just give up immdeately, instead of wasting my time by trying to convince you about the need for a redesign.

Easy, big fella! I'm long since convinced that something should be changed, but until I know what you're trying to address with your changes I'll have a hard time deciding whether the problems are important and whether your proposal effectively addresses them.

...
I've seen several people already agreeing that there is a design problem with the current boost::filesystem.

I'm among them. However, I have a hard time articulating what I found difficult about the library, other than that the path portability checks seemed to get in my way. If there's something deeper going on, I'd like to hear your opinion of what it is.

I am afraid that I too have a hard time to point out any clear error that is so clear that everyone will immdeately agree that it is an error. This thing is hard to grasp, and THAT is the reason that I think that it will be extra-ordinairy hard to convince enough people on this list that a change is not only needed, but that a particular change will be the one that is needed. If not impossible. I therefore, unfortunately, will have to stick to my opinion that it is too unlikely that this discussion is going to lead to an actual change of boost::filesystem, to put more time into it. The only approach that might work that I can think of is a re-newed analysis of *a* boost::filesystem - as if without looking at the current implementation. Then, if we go step by step, we might come to an agreement. However - only if everyone would participate from the beginning. And in practise that will not happen. A few people would agree, one would write an implementation and THEN 10 others will disagree because <argumentation using the current implementation> blah blah. Brrr. Ok, I might be afraid of something that wouldn't be happening, but I think that I can only TRY it if I can see boost::filesystem as one of my projects - and really, it is not, I am LOADED with other work as it is, already. I already give a few ideas in other posts. If _anyone_ likes it, then feel free to defend it towards others and go through the process of dicsussion and refinement. If any additional clarifications are needed then of course I will be happy to give them. -- Carlo Wood <carlo@alinoe.com>

David Abrahams

22 Aug 22 Aug

3:42 p.m.

Carlo Wood <carlo@alinoe.com> writes:

...

...
I'm among them. However, I have a hard time articulating what I found difficult about the library, other than that the path portability checks seemed to get in my way. If there's something deeper going on, I'd like to hear your opinion of what it is.

I am afraid that I too have a hard time to point out any clear error that is so clear that everyone will immdeately agree that it is an error.

The goal isn't to get everyone to agree; it's to explain what you're trying to fix. As you mentioned in the text I snipped, this is a simple request for examples of things that don't work.

...

This thing is hard to grasp, and THAT is the reason that I think that it will be extra-ordinairy hard to convince enough people on this list that a change is not only needed, but that a particular change will be the one that is needed. If not impossible.

I therefore, unfortunately, will have to stick to my opinion that it is too unlikely that this discussion is going to lead to an actual change of boost::filesystem, to put more time into it.

No offense intended, but I find this extremist attitude hard to grasp. Writing a rationale should hardly take you a quarter of the energy you've already invested in writing up proposals, and is all that's required for the group to even get started understanding what you're proposing. A pity. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Patrick Bennett

7:56 p.m.

David Abrahams wrote:

...

Carlo Wood <carlo@alinoe.com> writes:

...
I've seen several people already agreeing that there is a design problem with the current boost::filesystem.

I'm among them. However, I have a hard time articulating what I found difficult about the library, other than that the path portability checks seemed to get in my way. If there's something deeper going on, I'd like to hear your opinion of what it is.

The fact that boost::filesystem has zero support for internationalization kills it for me. Ideally, it would just use UTF-8 for everything. It could fairly easily be changed to expect and return UTF-8 strings and then convert (Win32...) or pass-through (Linux...) as appropriate based on the platform. As it is currently written, it can't be used for a Windows application that is used in any country which requires a non-Latin character set. If I'm writing an application that will be running in Japan or China (for e.g.), their file paths and filenames *require* full Unicode support. Boost::filesystem completely ignores this problem so I've never been able to use it. It may have other weaknesses but considering it fails right out of the gate for anyone having to write internationalized applications, I'd say that's a fairly big problem. Patrick Bennett

Jonathan Turkanis

8:22 p.m.

"Patrick Bennett" <patrick.bennett@inin.com> wrote in message news:4128FA68.3000108@inin.com...

...

The fact that boost::filesystem has zero support for internationalization kills it for me. Ideally, it would just use UTF-8 for everything.

wpath support seems to be in the works. See http://lists.boost.org/MailArchives/boost/msg60423.php. Jonathan

Patrick Bennett

23 Aug 23 Aug

2:44 a.m.

Jonathan Turkanis wrote:

...

"Patrick Bennett" <patrick.bennett@inin.com> wrote in message news:4128FA68.3000108@inin.com...

...
The fact that boost::filesystem has zero support for internationalization kills it for me. Ideally, it would just use UTF-8 for everything.

wpath support seems to be in the works. See http://lists.boost.org/MailArchives/boost/msg60423.php.

What is described in that posting wouldn't be correct for Windows. Win32 is UCS-2 natively, but having filesystem generically expect wchar_t* UCS-2 strings for all api's on Windows and char* UTF-8 strings on Linux isn't acceptable. (at least that's how I interpreted the post - since it mentioned using the same external representation as the internal representation). It should be char* (and std::string) UTF-8 strings throughout for all platforms - passing as-is for platforms like Linux, and converting to/from UCS-2 on Windows. I can't speak for other platforms as I'm most familiar with Windows and Linux. Patrick Bennett

Stefan Seefeld

2:55 a.m.

Patrick Bennett wrote:

...

It should be char* (and std::string) UTF-8 strings throughout for all platforms - passing as-is for platforms like Linux, and converting to/from UCS-2 on Windows. I can't speak for other platforms as I'm most familiar with Windows and Linux.

Isn't it abusive to force utf-8 into a std::string ? While it is technically possible the semantics isn't quite the same. operator [] (size_t i) wouldn't return the i'th character any more, at least not for characters outside the ascii range. Regards, Stefan

Rainer Deyke

3:27 a.m.

Stefan Seefeld wrote:

...

Isn't it abusive to force utf-8 into a std::string ? While it is technically possible the semantics isn't quite the same. operator [] (size_t i) wouldn't return the i'th character any more, at least not for characters outside the ascii range.

I don't think so. In C and C++, the type 'char' represents not only a text character, but also a byte of binary data. For example, when using the iostream library to read binary files, the data will be read as a sequence of 'char's. 'std::string' is the most useful container for blocks of binary data, largely because of the stringstream classes. Utf-8 is just a special case of binary data. -- Rainer Deyke - rainerd@eldwood.com - http://eldwood.com

Stefan Seefeld

3:57 a.m.

Rainer Deyke wrote:

...

Stefan Seefeld wrote:

...
Isn't it abusive to force utf-8 into a std::string ? While it is technically possible the semantics isn't quite the same. operator [] (size_t i) wouldn't return the i'th character any more, at least not for characters outside the ascii range.

I don't think so. In C and C++, the type 'char' represents not only a text character, but also a byte of binary data. For example, when using the iostream library to read binary files, the data will be read as a sequence of 'char's. 'std::string' is the most useful container for blocks of binary data, largely because of the stringstream classes.

I don't agree: When I want to read binary data, i.e. a sequence of 'char's, I don't use std::istream to begin with, as iostreams are all about formatting. Instead I would look at std::streambuf, as that's what encapsulates the device to read from. So, I don't see at all why I would read in binary data from a std::stringstream into a std::string. Regards, Stefan

Patrick Bennett

2:47 p.m.

Stefan Seefeld wrote:

...

Patrick Bennett wrote:

...
It should be char* (and std::string) UTF-8 strings throughout for all platforms - passing as-is for platforms like Linux, and converting to/from UCS-2 on Windows. I can't speak for other platforms as I'm most familiar with Windows and Linux.

Isn't it abusive to force utf-8 into a std::string ?

Abuse is a relative term here. ;)

...

While it is technically possible the semantics isn't quite the same. operator [] (size_t i) wouldn't return the i'th character any more, at least not for characters outside the ascii range.

Correct (kind of), but I'd far prefer that std::string be used than for some completely new type to be defined. For users of boost::filesystem, I can't personally think of a time when a user would need to iterate the paths or files a character at a time. Because of UTF-8's nature, even if a user were to search for something like '/', it would still work for find's, [], etc. UTF-8 maps to std::string extremely well. I think there is also a fair amount of precendents already set for using UTF-8 internally using std::string as the storage mechanism. UTF-8 strings don't contain embedded nul's (std::string still works for that though), ASCII characters remains ASCII characters, and you can tell if you're in the middle of a multi-byte sequence. Since we're talking about filesystem's inability to be used with internationalized applications, and you don't think UTF-8/std::string is the way to do it, what is your recommendation? Cheers... Patrick Bennett

Carlo Wood

3:27 p.m.

On Mon, Aug 23, 2004 at 09:47:52AM -0500, Patrick Bennett wrote:

...

Correct (kind of), but I'd far prefer that std::string be used than for some completely new type to be defined.

std::vector<char> comes to mind.

...

For users of boost::filesystem, I can't personally think of a time when a user would need to iterate the paths or files a character at a time.

Bzzzzt. Everything is always, somewhere, needed :p I can't think of it either - but the number of times that I thought something wasn't needed, or when I needed something and the original author wrote that he thought it would never be needed, are countless.

...

Because of UTF-8's nature, even if a user were to search for something like '/', it would still work for find's, [], etc. UTF-8 maps to std::string extremely well. I think there is also a fair amount of precendents already set for using UTF-8 internally using std::string as the storage mechanism.

People associate a std::string with something printable and will be tempted to use std::string::c_str() to pass the 'string' to some function. I think that using a std::string of binary data is not a good idea when it can contain '\0' in the middle.

...

UTF-8 strings don't contain embedded nul's (std::string still works for that though), ASCII characters remains ASCII characters, and you can tell if you're in the middle of a multi-byte sequence.

Ah - I retract my objection ;). If UTF-8 is garanteed not to have nul's embedded then there is no reason not to use std::string for it imho. -- Carlo Wood <carlo@alinoe.com>

Patrick Bennett

4:59 p.m.

Carlo Wood wrote:

...

On Mon, Aug 23, 2004 at 09:47:52AM -0500, Patrick Bennett wrote:

...
UTF-8 strings don't contain embedded nul's (std::string still works for that though), ASCII characters remains ASCII characters, and you can tell if you're in the middle of a multi-byte sequence.

Ah - I retract my objection ;). If UTF-8 is garanteed not to have nul's embedded then there is no reason not to use std::string for it imho.

Exactly! ;) At least that's how *I* feel about it. Cheers... Patrick Bennett

Loïc Joly

4:34 p.m.

Patrick Bennett wrote:

...

Stefan Seefeld wrote:

...
Patrick Bennett wrote:

...
It should be char* (and std::string) UTF-8 strings throughout for all platforms - passing as-is for platforms like Linux, and converting to/from UCS-2 on Windows. I can't speak for other platforms as I'm most familiar with Windows and Linux.

Isn't it abusive to force utf-8 into a std::string ?

Abuse is a relative term here. ;)

...
While it is technically possible the semantics isn't quite the same. operator [] (size_t i) wouldn't return the i'th character any more, at least not for characters outside the ascii range.

Correct (kind of), but I'd far prefer that std::string be used than for some completely new type to be defined.

I don't know anything about i18n, but I believed that something like basic_string<SomeOtherCharacterSet> was the way to go. -- Loïc

Stefan Seefeld

6:22 p.m.

Loïc Joly wrote:

...

I don't know anything about i18n, but I believed that something like basic_string<SomeOtherCharacterSet> was the way to go.

Not if that implies a fixed character size, as that excludes encodings such as utf-8. (and besides the minimum character size for unicode becomes larger and larger, so even for fixed character size, wchar_t doesn't seem to be enough any more either). Regards, Stefan

Jonathan Turkanis

3:18 a.m.

"Patrick Bennett" <patrick.bennett@inin.com> wrote in message:

...

Jonathan Turkanis wrote:

...
"Patrick Bennett" <patrick.bennett@inin.com> wrote in message:

...

...
...
The fact that boost::filesystem has zero support for internationalization kills it for me. Ideally, it would just use UTF-8 for everything.

...

...
wpath support seems to be in the works. See http://lists.boost.org/MailArchives/boost/msg60423.php.

...

What is described in that posting wouldn't be correct for Windows.

...

It should be char* (and std::string) UTF-8 strings throughout for all platforms - passing as-is for platforms like Linux, and converting to/from UCS-2 on Windows. I can't speak for other platforms as I'm most familiar with Windows and Linux.

Sounds reasonable, but I'm not an expert, and I know there are a number of competing considerations. My main point was that internationalization is being addressed. I'm sure your input will be welcome. Jonathan

Jonathan Turkanis

3:39 a.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cgbn5o$dl4$1@sea.gmane.org...

...

"Patrick Bennett" <patrick.bennett@inin.com> wrote in message:

...

...
It should be char* (and std::string) UTF-8 strings throughout for all platforms - passing as-is for platforms like Linux, and converting to/from UCS-2 on Windows. I can't speak for other platforms as I'm most familiar with Windows and Linux.

Sounds reasonable

Okay, Stefan convinced me otherwise. :-) Jonathan

7651

Age (days ago)

7654

Last active (days ago)

List overview

Download

26 comments

10 participants

participants (10)

Carlo Wood
David Abrahams
Jody Hagins
John Maddock
Jonathan Turkanis
Keith Burton
Loïc Joly
Patrick Bennett
Rainer Deyke
Stefan Seefeld