
Martin wrote:
An interesting idea and certainly much less work
However, as I understand it, you're suggesting limiting the wildcards simply to ensure that the filtered_directory_iterator behaves the same on posix and windows systems?
No. The main reason was to have a simple iterator for simple (and what I think is the most common) cases which also avoid the need to go via a list.
That's two separate requirements. 1. A simple iterator. By 'simple', you mean one using the underlying API. Right? 2. Avoid the list in list<path> glob(string const & pattern, path const & working_dir); Actually, this second requirement is contradictory to the first because glob()'s results must be stored internally for the iterator to then iterate over. No?
So did I but I put it into a separate iterator where you can define the rules completely independent of the filesystem.
This is, in essence, what I am proposing. I have now reworked the interface following Gennadiy's suggestion. Here's a glob_iterator that can recurse down directories: class BOOST_GLOB_DECL glob_iterator : public iterator_facade< glob_iterator // Derived type , filesystem::path const // value_type , single_pass_traversal_tag > { public: glob_iterator() {} glob_iterator(std::string const & pattern, filesystem::path const & wd, glob_flags flags); private: ... }; It works, but is considerably slower than the function returning a list. No doubt profiling will help track down what I'm doing inefficiiently. # A simple wrapper for the real glob() $ time ./real_glob_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l 934 real 0m0.042s user 0m0.010s sys 0m0.010s # The glob() function I posted earlier in the week. $ time ./glob_fun_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l 934 real 0m0.099s user 0m0.070s sys 0m0.010s # The new glob_iterator. $ time ./glib_it_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l 934 real 0m0.236s user 0m0.200s sys 0m0.010s I'm never sure whether to pay attention to the 'real' or to the 'user' times... Anyway, there's a clear heirarchy ATM.
Don't you ever search for things like "[a-d]*.{cxx,hpp}"?
I do it in the shell but I have never had the need to do it inside an application. I'm sure there are such applications.
Here's one. Qt (QProcess), gtk (gspawn*) and ACE (ACE_Process) all enable the user to spawn a child process in a portable way. However, what they all lack is a *powerful* way to initialise their data from a string containing a "command-line like" syntax. (And, no, passing an arbitrary "ls `rm -f *` foo.cpp" to the system() command isn't a viable alternative.) I've been playing around writing something that can parse a subset of the Bourne shell. Enough to make it easy and safe to launch a single process from a string. "parse_pseudo_command_line" fills a "spawn_data" variable. It's then simple to ascertain whether the request is safe or not. Now *this* is a function that would benefit from a portable glob. http://www.devel.lyx.org/~leeming/libs/child/doc/html/parse_pseudo_command_l... Equivalent URL: http://tinyurl.com/4c4v9
Also, how do you limit the wildcards? I take it you don't, but that the underlying matcher (findfirstfile, glob) will behave differently on receipt of the same pattern.
The filesystems already behave differently since one is case-sensitive and the other is not. Anyway, I think it is reasonable to limit the wildcards to some portable syntax e.g. max 2 '*' are allowed and they must either be the last character or followed by a '.'.
Again, how do you *limit* them? That implies that you must prescan the pattern, presumably throwing once you've determined that the thing is breaking your "reasonable" limits. Regards, Angus