New subject: [filesystem]Sorting based on equivalence information?

14 Nov 2005

      Hi all,

I have written a class around the glob_iterator class that W. Richard 
Johnson put into the file vault last year. My class finds all the paths 
and/or files matching a multi-level wildcard pattern. (i.e. the matching 
expressions may appear in more than one place, such as 
"/Users/*/MySource/??foo/*.cpp")

I need to detect loops caused by links between directories, and I also 
want to detect multiple paths to the same file, and only keep one of 
them. The only method that I have come up with so far is to keep the 
paths that I have found in a collection and test each new path against 
all of them by calling equivalent() once for each collection member. 
Needless to say, this is a little slow when the number of files or 
directories becomes large. It could be faster if I only tested symlinks, 
but I think that the existence of hard links forces me to test everything.

My first question: Can anyone suggest a better method of achieving the 
duplicates detection?

My second question (This is probably directed at Beman Dawes): I read 
the code for equivalent(), and it seems like I could derive a class from 
path that would implement operator< in such a way that it sorted by the 
equivalence information, rather than by the path string. If I could sort 
the paths by their equivalence information, then I could binary search 
the collection when I look for duplicates.

I would have to copy the code in equivalent(), which is an ugly thing to 
do, and I guess I would need to define all of the operators that are 
defined in path, since they are not virtual. Assuming that I did these 
things, though, does this sound like it's even valid? From the comments 
that I read in the equivalent() source, it seems that the file 
equivalence information can change, at least on a windows system. I 
guess I could create a handle_wrapper for each path that I add to the 
collection in order to keep the file from being moved while it's in the 
collection.

Of course, I can't be the first person who has gone down this path, or 
the library would probably already support this sort of sorting. Is this 
just a stupid idea?

Thanks,
Rush

[filesystem]Sorting based on equivalence information?

Rush Manbert

Paul Giaccone

troy d. straszheim

Rush Manbert

Rush Manbert

tags

participants (3)