interest in a glob_iterator? (directory_iterator with regex)
Folks-- Is there general interest in a "globbing" iterator? If so I've got one I'm willing to re-package for submission. (Or at least discuss with folks for improvement) If you're interested, read on: glob_iterator aggregates a directory_iterator and regex to provide shell-style "*", "?", "{....}", "[....]" and "[^...]" wildcarding. boost components used: filesystem::directory_iterator filesystem::path filter_iterator reg_expression c_regex_traits Usage example: //...do something to all .cpp and .c files glob_iterator start( "*.{c,cpp}" ); glob_iterator end; while( start != end ){ std::string filename( start->leaf() ); //...do something with/to filename ++start; }
O.K. I'll do it. But my first task is to get a discussion going on the developer's list. Or, we could kick start the process if I sent you all the 100+ lines of code for an initial review and/or sanity check. How's that sound? --rich On Wednesday, January 14, 2004, at 05:40 AM, Angus Leeming wrote:
Rich Johnson wrote:
Folks-- Is there general interest in a "globbing" iterator?
I'm interested.
Rich Johnson <rjohnson@dogstar-interactive.com> writes:
O.K. I'll do it. But my first task is to get a discussion going on the developer's list.
I'd want to see it decomposed into the following components: 1. A function which translates glob patterns into regexes 2. A filter_iterator adaptor which uses a regex matching function to match the paths from a directory_iterator -- Dave Abrahams Boost Consulting www.boost-consulting.com
"David Abrahams" <dave@boost-consulting.com> wrote in message news:ubrp56g5q.fsf@boost-consulting.com...
Rich Johnson <rjohnson@dogstar-interactive.com> writes:
O.K. I'll do it. But my first task is to get a discussion going on the developer's list.
I'd want to see it decomposed into the following components:
1. A function which translates glob patterns into regexes
2. A filter_iterator adaptor which uses a regex matching function to match the paths from a directory_iterator
And a more descriptive, less jargon-ish name. ----------------- Jeff Flinn Applied Dynamics, International
Jeff Flinn wrote:
And a more descriptive, less jargon-ish name.
Why? Rich is proposing something that would iterate over the files returned by the unix 'glob' function. "*.abc" is a 'glob' just as "^.*\.abc$" is the equivalent 'regular expression'. Both are jargon and both are fine. It's just that you're used to the latter... -- Angus
"Angus Leeming" <angus.leeming@btopenworld.com> wrote in message news:bu62a4$c49$1@sea.gmane.org...
Jeff Flinn wrote:
And a more descriptive, less jargon-ish name.
Why? Rich is proposing something that would iterate over the files returned by the unix 'glob' function.
Other OS's have wildcard matching, with nary a mention of "glob". No doubt in the minority, I didn't have a clue what a "glob" was. My first impression was not to bother reading as it was a joke - based on "glob"'s definition in the(Merriam-Webster) dictionary - "a small drop" or "a large rounded mass". I must be showing my age - "Rock - that's not music!, now Benny Goodman's another thing...".
"*.abc" is a 'glob' just as "^.*\.abc$" is the equivalent 'regular expression'. Both are jargon and both are fine. It's just that you're used to the latter..
Then you might want this link in the documentation: http://info.astrian.net/jargon/terms/g.html#glob Jeff
Angus Leeming wrote:
Why? Rich is proposing something that would iterate over the files returned by the unix 'glob' function.
I'd never heard of a 'glob' either. But I have no problems with the name if it means something to someone. Cheers Russell
On Wednesday, January 14, 2004, at 10:32 PM, David Abrahams wrote:
Rich Johnson <rjohnson@dogstar-interactive.com> writes:
O.K. I'll do it. But my first task is to get a discussion going on the developer's list.
I'd want to see it decomposed into the following components: This was the approach I took.
1. A function which translates glob patterns into regexes This is the non-trivial part. I've opted to use a two-step process of: - regex_traits to map shell-style meta chars to regex syntax where-ever there's a 1-1 mapping. - a minimal string transform to convert the glob pattern to a regex pattern corresponding to the above traits. The specifics of this implementation are of course subject to debate.
2. A filter_iterator adaptor which uses a regex matching function to match the paths from a directory_iterator
This is the easy part--provide a predicate which encapsulates and invokes the regex produced above. --rich
1. A function which translates glob patterns into regexes This is the non-trivial part. I've opted to use a two-step process of: - regex_traits to map shell-style meta chars to regex syntax where-ever there's a 1-1 mapping. - a minimal string transform to convert the glob pattern to a regex pattern corresponding to the above traits. The specifics of this implementation are of course subject to debate.
Can't we do the whole thing with a regex search and replace? This has the advantage that we don't need to instantiate a new basic_regex template instance (so less code bloat if the user is already using regex). I've been playing with this, and have attached a simple dos_wildcard predicate that works this way - you can use this with boost::filter_iterator_adapter right now, and should work for both portable and native file paths (excluding one or two corner cases, like ":" in RaiserFS file names, even this can probably be worked around). How does this compare to yours? I admit I haven't tried with unix wildcards - although last time I looked at the std, I admit I was surprised by how complex (and subtle) these are - let me know if you want me to look for a unix-wildcard to regex transform.
2. A filter_iterator adaptor which uses a regex matching function to match the paths from a directory_iterator
This is the easy part--provide a predicate which encapsulates and invokes the regex produced above.
Yep, typedef filter_iterator_adapter<directory_iterator, some_kind_of_wildcard>::type wildcard_iterator; would almost do the trick, however I would really like: * The ability to combine predicates together in logical operations and filter based on any predicate (the iterator should have the same type irrespective of predicate type). That way we can filter based on file time, file type, file name or whatever. * The abilty to search recursively if you want it (like unix find). * The ability for recursive searches to "do the right thing" with links - which is to say follow them, without getting into endless loops if the directory structure is cyclic. * The ability to expand shell-like wildcards, for example ~/boost/libs/*/build/*.mak These might not all be the same iterator type of course :-) John.
Sounds great to me! Rich Johnson wrote:
Folks--
Is there general interest in a "globbing" iterator? If so I've got one I'm willing to re-package for submission. (Or at least discuss with folks for improvement)
If you're interested, read on:
glob_iterator aggregates a directory_iterator and regex to provide shell-style "*", "?", "{....}", "[....]" and "[^...]" wildcarding.
boost components used: filesystem::directory_iterator filesystem::path filter_iterator reg_expression c_regex_traits
Usage example: //...do something to all .cpp and .c files glob_iterator start( "*.{c,cpp}" ); glob_iterator end; while( start != end ){ std::string filename( start->leaf() ); //...do something with/to filename ++start; }
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- D. Alan Stewart Senior Software Developer Layton Graphics, Inc. 155 Woolco Drive Marietta, GA 30065 Voice: 770/973-4312 Fax: 800/367-8192 http://www.layton-graphics.com
On Tue, 13 Jan 2004 18:45:23 -0500, Rich Johnson wrote
Is there general interest in a "globbing" iterator? If so I've got one I'm willing to re-package for submission. (Or at least discuss with folks for improvement)
I'd like to see this as well. Jeff
participants (8)
-
Alan Stewart
-
Angus Leeming
-
David Abrahams
-
Jeff Flinn
-
Jeff Garland
-
John Maddock
-
Rich Johnson
-
Russell Hind