New design proposal for boost::filesystem

Just a thought on the filesystem implementation. Why not separate the path from the filesystem? Let the path be a container for all kind of paths that follow the generic syntax explained in the boost::filesystem documentation. Then use implementation classes that handles root name and conversion to native strings etc. The benefits would be: 1. The path can be used for other things than filesystems (XPath, Win32 registry) 2. Operations can be implemented selectivly for different types (No need for the current messy #ifdefs) bool exists( const path_base<posixfs<std::string> >& aPath); bool exists( const path_base<win32fs<std::wstring> >& aPath); bool exists( const path_base<win32reg<boost::fixed_string<100> > >& aPath); 3. Easy to expand the functionality for specific tasks without touching the boost headers. The implementation could look something like: template <class ImplT> class path_base : private ImplT { public: typedef ImplT::string_type string_type; path_base() : ImplT() {} // all path operations as in current implementation ... private: string_type m_path; }; // // implementation for win32 filesystem // template <class StringT> class win32fs { // root-name ::= root-drive | root-share | root-device // root-drive ::= char ":" // root-share ::= "//" name // root-device ::= name ":" protected: typedef StringT string_type; win32fs() { } string_type make_generic(const string_type::value_type* src); string_type native_file_string(const string_type& m_path) const { return algorithm::replace_all_copy(m_root + m_path, "/", "\\"); } ... private: string_type m_root }; // // implementation of posix filesystem // template <class StringT> class posixfs { win32fs() { } string_type make_generic(const string_type::value_type* src); string_type native_file_string(const string_type& m_path) const { return m_path; } ... } #ifdef BOOST_POSIX typedef path_base<posixfs<std::string>> path; #else ...

I agree that boost::filesystem needs to be redesigned. Your interface looks like a great start -- I like it.

On Fri, Sep 03, 2004 at 07:54:29AM +0000, Martin wrote:
Just a thought on the filesystem implementation.
Why not separate the path from the filesystem?
Isn't it already separated? fs::path just hold 'values' and then you have the 'operations' as an addition, separated from it.
Let the path be a container for all kind of paths that follow the generic syntax explained in the boost::filesystem documentation.
Then use implementation classes that handles root name and conversion to native strings etc.
The benefits would be: 1. The path can be used for other things than filesystems (XPath, Win32 registry)
So can std::string. I am not sure that this is a benefit.
2. Operations can be implemented selectivly for different types (No need for the current messy #ifdefs) bool exists( const path_base<posixfs<std::string> >& aPath); bool exists( const path_base<win32fs<std::wstring> >& aPath); bool exists( const path_base<win32reg<boost::fixed_string<100> > >& aPath);
That would move the #ifdefs to the application. I like it better to have two types 'native' and non-native (relative) rather than have to write code that needs to use a different type explicitely for every OS that I want to support in my code.
3. Easy to expand the functionality for specific tasks without touching the boost headers.
The implementation could look something like:
template <class ImplT> class path_base : private ImplT { public: typedef ImplT::string_type string_type; path_base() : ImplT() {} // all path operations as in current implementation ... private: string_type m_path; };
// // implementation for win32 filesystem // template <class StringT> class win32fs { // root-name ::= root-drive | root-share | root-device // root-drive ::= char ":" // root-share ::= "//" name // root-device ::= name ":" protected: typedef StringT string_type; win32fs() { } string_type make_generic(const string_type::value_type* src); string_type native_file_string(const string_type& m_path) const { return algorithm::replace_all_copy(m_root + m_path, "/", "\\"); } ... private: string_type m_root }; // // implementation of posix filesystem // template <class StringT> class posixfs { win32fs() { } string_type make_generic(const string_type::value_type* src); string_type native_file_string(const string_type& m_path) const { return m_path; } ... }
#ifdef BOOST_POSIX typedef path_base<posixfs<std::string>> path; #else ...
I fail to see how this all would address the same problems as that I tried to address by the introduction of an explicit native and relative path types - the latter of which would allow automatic completion by tieing it to a (native) absolute path. -- Carlo Wood <carlo@alinoe.com>

Martin wrote:
Just a thought on the filesystem implementation.
Why not separate the path from the filesystem?
Let the path be a container for all kind of paths that follow the generic syntax explained in the boost::filesystem documentation.
That's already the case, no?
Then use implementation classes that handles root name and conversion to native strings etc.
The benefits would be: 1. The path can be used for other things than filesystems (XPath, Win32 registry)
I think you're XPath example is a bit bogus. How can you represent descentant::section[@id=foobar]/title with boost::path?
2. Operations can be implemented selectivly for different types (No need for the current messy #ifdefs) bool exists( const path_base<posixfs<std::string> >& aPath); bool exists( const path_base<win32fs<std::wstring> >& aPath);
Aren't those two mutually exclusive? You have the first on POSIX systems and the second on the win32 systems. If so, I don't understand how I can write portable code -- I will have to call different function on different pratforms, something like: #ifdef BOOST_WINDOWS if (exists(win32path(p)) #else if (exists(posix_path(p)) #endif which meean "messy #ifdefs" are just lifted to the user code.
bool exists( const path_base<win32reg<boost::fixed_string<100> > >& aPath);
This can be handled by class registry { bool exists(const fs::path&); }; Actually, I think it is very desired to have boost::path usable in more contexts than the filesystem. For example fs::path base("http://my_site.org"); ..... get_file(base / "foo" / "bar") ..... would be great. But I'm just don't sure this requires all that drastic changes. - Volodya

Vladimir Prus <ghost <at> cs.msu.su> writes:
Let the path be a container for all kind of paths that follow the generic syntax explained in the boost::filesystem documentation.
That's already the case, no? No, the current implementation only allows you to have a single path "type" for everything since the handling of root etc is built into the path class (via #ifdefs for windows and posix). If you want to add another path type (e.g. url) you would need to keep track of the type inside the class and then do "if (type==url) ... else ..." in all operations.
The benefits would be: 1. The path can be used for other things than filesystems (XPath, Win32 registry)
I think you're XPath example is a bit bogus. How can you represent
descentant::section[ <at> id=foobar]/title
with boost::path?
Didn't think about all cases in detail. XPath was a bad example but it might still be useful to do things like: path_base<xpath> foo("descentant::section[ <at> id=foobar]/title") xml.find(foo.branch_path() / "author"); if (foo.has_root()) ...
2. Operations can be implemented selectivly for different types (No need for the current messy #ifdefs) bool exists( const path_base<posixfs<std::string> >& aPath); bool exists( const path_base<win32fs<std::wstring> >& aPath);
Aren't those two mutually exclusive? You have the first on POSIX systems and the second on the win32 systems.
Yes posixfs and win32fs are mutually exclusive but that doesn't mean that all path types are. There might still be a need to have more than one path type in a single application. With the current implementation it would look like this: bool exists(const path& aPath) { if (aPath.type == path::filesystem) #ifdef BOOST_POSIX ... #elif BOOST_WINDOWS ... else if (aPath.type == path::registry) ... #endif else if (aPath.type == ... } It is possible to do it like that but the problem I see is that when you add a new path type you will need to add another "if (aPath.type ==" to all operations instead of just adding another overload.
If so, I don't understand how I can write portable code -- I will have to call different function on different pratforms, something like:
#ifdef BOOST_WINDOWS if (exists(win32path(p)) #else if (exists(posix_path(p)) #endif
It is easy to do a conditional typedef in the path.hpp so that you can use "path" portable on both systems. #ifdef BOOST_WINDOWS #include "win32fs.hpp" typedef path_base<win32fs> path; #elif BOOST_POSIX #include "posixfs.hpp" typedef path_base<posixfs> path; #endif ... if (exists(path(p))
bool exists( const path_base<win32reg<boost::fixed_string<100> > >& aPath);
This can be handled by
class registry { bool exists(const fs::path&); };
Don't understand what you mean. The current path implementation doesn't allow you to handle more than one type (i.e. winfs or posixfs). You can't store a registrypath in a fs::path since it can't handle a hkey.. root.
Actually, I think it is very desired to have boost::path usable in more contexts than the filesystem. For example
fs::path base("http://my_site.org"); ..... get_file(base / "foo" / "bar") .....
would be great. But I'm just don't sure this requires all that drastic changes.
Why do you think it is a drastic change? Changing operations will not take much time since it is only to move the things inside "#ifdef BOOST_POSIX" and "#ifdef BOOST_WINDOWS" into separate overloads. Documentation stays exactly the same (even if I personally would like to remove the part about "portable paths"). posixfs doesn't have a root so it is basicly an empty implementation class. win32fs requires some work but all the code is in the current path implementation so it is mostly copy paste. creating the path_base probably requires some thinking to make it easy to add new types but to handle just filesystems it wouldn't take long.

Martin wrote:
Vladimir Prus <ghost <at> cs.msu.su> writes:
Let the path be a container for all kind of paths that follow the generic syntax explained in the boost::filesystem documentation.
That's already the case, no? No, the current implementation only allows you to have a single path "type" for everything since the handling of root etc is built into the path class (via #ifdefs for windows and posix).
I would say that most of #ifdefs inside path.cpp exist to handle root path on Windows. The path is stored as a string and there are many instances of code which looks at first characters to determine if there <letter> ':' '/' pattern. I think if path explicitly store - a bool telling if path has a root - the length of the root all this logic would be much simplified.
If you want to add another path type (e.g. url) you would need to keep track of the type inside the class and then do "if (type==url) ... else ..." in all operations.
Not sure. Why appending "foo" path element to a URL is different from adding the same "foo" to a regular filesystem path? with ad
2. Operations can be implemented selectivly for different types (No need for the current messy #ifdefs) bool exists( const path_base<posixfs<std::string> >& aPath); bool exists( const path_base<win32fs<std::wstring> >& aPath);
Aren't those two mutually exclusive? You have the first on POSIX systems and the second on the win32 systems.
Yes posixfs and win32fs are mutually exclusive but that doesn't mean that all path types are. There might still be a need to have more than one path type in a single application.
If we take the approach that path is "root" + relative path, where "root" is an arbitrary string, then almost all kinds of path can be stored in boost::path.
With the current implementation it would look like this:
bool exists(const path& aPath) { if (aPath.type == path::filesystem) #ifdef BOOST_POSIX ... #elif BOOST_WINDOWS ... else if (aPath.type == path::registry) ... #endif else if (aPath.type == ... }
It is possible to do it like that but the problem I see is that when you add a new path type you will need to add another "if (aPath.type ==" to all operations instead of just adding another overload.
Ok, let me ask another way: why exists(registry_path(p)) is better than exists_in_registry(p); Where 'p' is of type boost::fs::path and registry_path is path_base<win32reg<boost::fixed_string<100> > > that you've proposed.
If so, I don't understand how I can write portable code -- I will have to call different function on different pratforms, something like:
#ifdef BOOST_WINDOWS if (exists(win32path(p)) #else if (exists(posix_path(p)) #endif
It is easy to do a conditional typedef in the path.hpp so that you can use "path" portable on both systems.
#ifdef BOOST_WINDOWS #include "win32fs.hpp" typedef path_base<win32fs> path; #elif BOOST_POSIX #include "posixfs.hpp" typedef path_base<posixfs> path; #endif
...
if (exists(path(p))
Yes, it's possible. You essentially lift the #ifdefs: they are inside 'path' methods now and now they are around definition of 'path'. That will work, but I still don't understand why it's good given that win32 and posix are mutually exclusive. Do you think it would be good to compile both win32fs adn posixfs on all systems?
bool exists( const path_base<win32reg<boost::fixed_string<100> > >& aPath);
This can be handled by
class registry { bool exists(const fs::path&); };
Don't understand what you mean.
I mean that the object you want to operate at (filesystem, registry), can be encoded not only in argument type of 'exists', but also in the name of the function you call (e.g. exists_in_registry, or registry::exists).
The current path implementation doesn't allow you to handle more than one type (i.e. winfs or posixfs). You can't store a registrypath in a fs::path since it can't handle a hkey.. root.
Why? Can't I put literal "HKEY_CURRENT_USER" in boost::path?
Actually, I think it is very desired to have boost::path usable in more contexts than the filesystem. For example
fs::path base("http://my_site.org"); ..... get_file(base / "foo" / "bar") .....
would be great. But I'm just don't sure this requires all that drastic changes.
Why do you think it is a drastic change? Changing operations will not take much time since it is only to move the things inside "#ifdef BOOST_POSIX" and "#ifdef BOOST_WINDOWS" into separate overloads.
And #ifdef the overloads ;-) I don't think Win32 API calls can be compiled on Linux. I really think that win32/posix distinction should say as it is. For other types of path, I'm not sure. I *think* that boost::fs::path can accomodate all of them. Of course, if you stick URL into boost::fs::path, the name 'native_file_path' becomes questionable ;-) - Volodya

If you want to add another path type (e.g. url) you would need to keep track of the type inside the class and then do "if (type==url) ... else ..." in all operations.
Not sure. Why appending "foo" path element to a URL is different from adding the same "foo" to a regular filesystem path? with ad
I was talking about operations. The filesystem operations doesn't necessary make sense for non-filesystem paths. If only one path type is used you can't get a compile time check that the operation works for the path type.
If we take the approach that path is "root" + relative path, where "root" is an arbitrary string, then almost all kinds of path can be stored in boost::path.
agree but as someone else said, you can do it with std::string as well.
Ok, let me ask another way: why
exists(registry_path(p))
is better than
exists_in_registry(p);
type safety. There is no risk of sending a filesystem path to a registry operation or the other way around.
#ifdef BOOST_WINDOWS #include "win32fs.hpp" typedef path_base<win32fs> path; #elif BOOST_POSIX #include "posixfs.hpp" typedef path_base<posixfs> path; #endif
...
if (exists(path(p))
Yes, it's possible. You essentially lift the #ifdefs: they are inside 'path' methods now and now they are around definition of 'path'. That will work, but I still don't understand why it's good given that win32 and posix are mutually exclusive. Do you think it would be good to compile both win32fs adn posixfs on all systems?
ofcourse not. You compile one or the other. When I talked about messy #ifdefs I mainly meant the ones in the path class but they are not the main reason for the proposal. It would be interesting to see how the current implementation extends into unicode. If it follows on the same way it would be "#ifdef UNICODE" in the path class and in the operations.
The current path implementation doesn't allow you to handle more than one type (i.e. winfs or posixfs). You can't store a registrypath in a fs::path since it can't handle a hkey.. root.
Why? Can't I put literal "HKEY_CURRENT_USER" in boost::path?
You can store it as a root if you use a trailing ":". It will be treated as a windows device but for "HKEY_CURRENT_USER:software/xxx" has_root_path() will return false (the path class doens't expect relative paths on devices so some methods work others don't)
I really think that win32/posix distinction should say as it is. For other types of path, I'm not sure. I *think* that boost::fs::path can accomodate all of them. Of course, if you stick URL into boost::fs::path, the name 'native_file_path' becomes questionable
My idea was to have a layout like: path.hpp - path_base class fspath.hpp - path class for filesystems. Conditionally includes win32fs.hpp or posixfs.hpp fsoperations - filesystem operations. Conditionally includes win32oper.hpp or posixoper.hpp urlpath.hpp - path class for urls urloperations.hpp - operations on url paths.

Hi Martin, [BTW, may I gently ask you to put your full name on your messages?]
If you want to adimportantr path type (e.g. url) you would need to keep track of the type inside the class and then do "if (type==url) ... else ..." in all operations.
Not sure. Why appending "foo" path element to a URL is different from adding the same "foo" to a regular filesystem path? with ad
I was talking about operations. The filesystem operations doesn't necessary make sense for non-filesystem paths. If only one path type is used you can't get a compile time check that the operation works for the path type.
I think we probably need to distinguish between two cases. First is URL paths: file:///foo http://boost.org man:strcmp It's not currently supported. I think URL should be represented with one class (which can be even fs::path). If URL can to constructed from user input, and if new protocols can be added dynamically, it does not make sense to hardcode the protocol. The second case is 'generic paths' -- just a sequence of element, which is not related to filesystem. The example you give is registry. I don't know what other cases are there, so it hard to tell how important it is. Yes, with separate "registry_path" you'll get more typechecking, but to figure out how much overlap this class will have with 'path', you'd need to write it. BTW, you don't need to use path_base<registry_path>, you can use: class registry_path : private path_base { }; which I find a bit simpler. Another, BTW, program_options will have registry support sooner or later, so maybe this use case is not imptorant ;-)
If we take the approach that path is "root" + relative path, where "root" is an arbitrary string, then almost all kinds of path can be stored in boost::path.
agree but as someone else said, you can do it with std::string as well.
Ok, let me ask another way: why
exists(registry_path(p))
is better than
exists_in_registry(p);
type safety. There is no risk of sending a filesystem path to a registry operation or the other way around.
Ok, for registry vs. filesystem, I think it makes sense. For different kinds of URL -- likely not.
#ifdef BOOST_WINDOWS #include "win32fs.hpp" typedef path_base<win32fs> path; #elif BOOST_POSIX #include "posixfs.hpp" typedef path_base<posixfs> path; #endif
...
if (exists(path(p))
Yes, it's possible. You essentially lift the #ifdefs: they are inside 'path' methods now and now they are around definition of 'path'. That will work, but I still don't understand why it's good given that win32 and posix are mutually exclusive. Do you think it would be good to compile both win32fs adn posixfs on all systems?
ofcourse not. You compile one or the other.
When I talked about messy #ifdefs I mainly meant the ones in the path class but they are not the main reason for the proposal.
It would be interesting to see how the current implementation extends into unicode. If it follows on the same way it would be "#ifdef UNICODE" in the path class and in the operations.
Everybody has different opinion on this. I'd prefer: class path { public: path(const char*); path(const whar_t*); string native_file_string(); wstring native_file_wstring(); }; Since some parts of code might prefer Unicode, and others ascii.
The current path implementation doesn't allow you to handle more than one type (i.e. winfs or posixfs). You can't store a registrypath in a fs::path since it can't handle a hkey.. root.
Why? Can't I put literal "HKEY_CURRENT_USER" in boost::path?
You can store it as a root if you use a trailing ":". It will be treated as a windows device but for "HKEY_CURRENT_USER:software/xxx" has_root_path() will return false (the path class doens't expect relative paths on devices so some methods work others don't)
Yea, I've suggested in a previous email that root handling code should be a bit refactored. Instead of looking a the string, has_root_path() should return a flag that was initialized during construction.
I really think that win32/posix distinction should say as it is. For other types of path, I'm not sure. I *think* that boost::fs::path can accomodate all of them. Of course, if you stick URL into boost::fs::path, the name 'native_file_path' becomes questionable
My idea was to have a layout like:
path.hpp - path_base class fspath.hpp - path class for filesystems. Conditionally includes win32fs.hpp or posixfs.hpp fsoperations - filesystem operations. Conditionally includes win32oper.hpp or posixoper.hpp urlpath.hpp - path class for urls urloperations.hpp - operations on url paths.
I think that one of the primary issue with URL path is the need for dynamically pluggable protocol handlers. Something like in KDE: http://www.heise.de/ct/english/01/05/242/ http://developer.kde.org/documentation/library/cvs-api/kio/html/namespaceKIO... http://developer.kde.org/documentation/library/cvs-api/kio/html/classKIO_1_1... Hmm... maybe I should find more time for Boost.Plugin that I've started to draft some time ago. - Volodya

... I would say that most of #ifdefs inside path.cpp exist to handle root
At 03:01 AM 9/6/2004, Vladimir Prus wrote: path
on Windows. The path is stored as a string and there are many instances of code which looks at first characters to determine if there <letter> ':' '/' pattern. I think if path explicitly store
- a bool telling if path has a root - the length of the root
all this logic would be much simplified.
That may be true, but it adds runtime cost. The benefit of somewhat simplified implementation code doesn't seem worth the cost. The current implementation code isn't all that bad. It has not been a source of bugs or other problems that would indicate it needs replacing. Even if it did need replacing, that could be done by refactoring without adding runtime costs. Am I missing something? --Beman
participants (5)
-
Beman Dawes
-
Carlo Wood
-
Martin
-
tom brinkman
-
Vladimir Prus