
Taking a quick look at the docs, the regex you want is: "Resurfacing(.*?)Home" Just a thought. Seems like quite the thread for a regex pattern. And like John says, it should match from the first Resurfacing to the second Home. If it didn't, I'd be concerned. The * operator by itself is greedy. It wants to make matches as long as possible. By using the *? notation, it makes it a non-greedy modifier, ie, making the match as short as possible. http://www.boost.org/libs/regex/doc/syntax_perl.html Under the heading 'Non greedy repeats' pretty much explains things. (Note: This applys to the perl style regex, I'm not entirely sure about the other behaviors.) Cheers, Paul On 8/30/06, John Maddock <john@johnmaddock.co.uk> wrote:
kiran wrote:
Why is the second one not picked ? This was my question.
It is picked for me: I modified your sample program (see below) so that it actually compiled, and didn't reply on external files, and I see exactly the output expected: everything from the first "Resurfacing" to the last "home".
#include "boost/regex.hpp" using namespace boost; using namespace std; #include<fcntl.h> #include<sys/types.h> #include <iostream>
int main() { char buf[10000]; //int fd = open("glass.htm", O_RDONLY); //int size = read(fd, buf, 10000); string line = "<!-- saved from url=(0022)http://internet.e-mail -->\n" "<html><head>\n" "<title>UGlassIt Fibre-Shelkote Pool Resurfacing for Swimming Pools</title>\n" "<meta name=\"robots\" content=\"index,follow\">Home\n" "<meta name=\"keywords\" content=\"pool Resurfacing,uglassit,fibre-shelkote,Uglassit,Fibre-shelkote,swimming pool resurfacing\">Home"; //close(fd); regex expr("Resurfacing(.|\n)*Home" , boost::regex::icase | boost::regex::perl); try { sregex_iterator itr(line.begin(), line.end(), expr, boost::match_not_dot_newline); sregex_iterator i; while(itr != i) { cout<<string((*itr)[0].first, (*itr)[0].second)<<" "<<(*itr).position(0)<<endl; itr++; } } catch(std::runtime_error e) { cout<<e.what()<<endl<<flush; } }
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users