[filesystem] recursive iteration

Hi Beman, I just had a look at boost::filesystem, and it felt really good to use. There is one common idiom that might be worth including: recursive traversal. So I would like some like recursive_directory_iterator i( path( "foo" ) ) , end; for( i.set_traversal_mode( files_only ); i != end; ++ i ) // print all file names recursively, default could be both dirs and files cout << i->leaf(); My motivation for this is that 1. the manual implementation of the recursion can require a bit tweaking to get right 2. other languages seem to provide such features, eg. os.walk() in python. If people like the idea, I can provide a starting point implementation. br Thorsten

Making it handle symbolic links properly is certainly not trivial, and cannot be done efficiently with the current version of the filesystem library (assuming no additional platform-specific code is added). As has been discussed already in relation to the implementation of the `equivalent' function which Beman is adding, the common implementation is to store the inode numbers of every directory with a link count greater than one in a hash set, and then make sure that they are traversed only once. This implementation is not perfect, however, because most platforms, including Windows and Linux, do not guarantee that inode numbers are unique for files for which there is not an open handle. Thus, the way to remedy this would be to keep an open handle to each directory whose inode number is being stored; but this creates the problem of the traversal being limited by the number of open file handles/descriptors a single process can have. -- Jeremy Maitin-Shepard

"Jeremy Maitin-Shepard" <jbms@attbi.com> wrote in message news:878yi09w29.fsf@jbms.ath.cx... the the
problem of the traversal being limited by the number of open file handles/descriptors a single process can have.
I understand the reluctance to add a feature which raises such thorny issues. However, I think the fact that recursive directory iteration is hard to implement properly is an excellent reason to put it in a library. I would imagine that a lot of existing code doesn't handle all these issues correctly. Limitations of the implementation can be explained in the documentation. Jonathan

Given though that as the filesystem library is now, this visitor algorithm would require platform-specific implementations, thus making the function somewhat of a black box, it perhaps makes sense to wait until there is link count and symbolic link support in the main library before adding this visitor algorithm. As far as the filesystems on which inodes are not unique, I believe it is primarily only a problem on network filesystems like AFS and Coda FS, in which the inode numbers are only valid for files which are cached locally or a to which a handle is open, because the files are actually identified by a 128-bit identifier which is not exposed (completely) by the (Linux) kernel. -- Jeremy Maitin-Shepard

"Jeremy Maitin-Shepard" <jbms@attbi.com> wrote in message news:87llm08dyi.fsf@jbms.ath.cx... library
before adding this visitor algorithm.
I agree, if link count and symbolic link support is in the works (I haven't been following filesystem development very closely). I'm not against black boxes, though. Jonathan

"Janusz Piwowarski" <jpiw@go2.pl> wrote in message news:002601c40b5c$96b7c820$2900050a@januszpXP...
I like breadth_first_search and depth_first_search much more than files_only. If you want to only look at files, you can do recursive_directory_iterator i( path( "foo" ), breadth_first_search ) , end; for ( ; i != end; ++i) { if (is_directory(*i)) continue; ... } Joe Gottman Joe Gottman

At 06:51 AM 3/16/2004, Thorsten Ottosen wrote:
As you have probably figured out from the responses of others, there is a lot of interest but coming up with the right feature set would require some thought. Another desirable feature is filtering. (That is also a need for non-recursive iteration.) John Maddock and others did some work on filtering a while ago, but then we got busy with other things and didn't pursue it. Very early in the library's evolution there was an attempt to handle recursion (for copying, IIRC) via adding a runtime option or two. The number of options multiplied, and the approach was dropped as unwieldy. I wonder if a policy-based template approach might be more appropriate? --Beman

"Beman Dawes" <bdawes@acm.org> wrote in message news:4.3.2.7.2.20040318212145.02c55000@mailhost.esva.net...
Maybe. I merely want directory_iterator in a recursive variant. I agree that I should check for a directory myself, so my file_only etc is not important. The depth_first etc thing could be added later; however, as I understood it, we can not guarantee the traversal order. That is also fine, I just think of it as a utility to visit all files in a path. br Thorsten
participants (6)
-
Beman Dawes
-
Janusz Piwowarski
-
Jeremy Maitin-Shepard
-
Joe Gottman
-
Jonathan Turkanis
-
Thorsten Ottosen