Taking a quick look at the docs, the regex you want is:
"Resurfacing(.*?)Home"
Just a thought. Seems like quite the thread for a regex pattern.
And like John says, it should match from the first Resurfacing to the second
Home. If it didn't, I'd be concerned.
The * operator by itself is greedy. It wants to make matches as long as
possible. By using the *? notation, it makes it a non-greedy modifier, ie,
making the match as short as possible.
http://www.boost.org/libs/regex/doc/syntax_perl.html
Under the heading 'Non greedy repeats' pretty much explains things.
(Note: This applys to the perl style regex, I'm not entirely sure about the
other behaviors.)
Cheers,
Paul
On 8/30/06, John Maddock
kiran wrote:
Why is the second one not picked ? This was my question.
It is picked for me: I modified your sample program (see below) so that it actually compiled, and didn't reply on external files, and I see exactly the output expected: everything from the first "Resurfacing" to the last "home".
#include "boost/regex.hpp" using namespace boost; using namespace std; #include
#include #include <iostream> int main() { char buf[10000]; //int fd = open("glass.htm", O_RDONLY); //int size = read(fd, buf, 10000); string line = "<!-- saved from url=(0022)http://internet.e-mail -->\n" "<html><head>\n" "<title>UGlassIt Fibre-Shelkote Pool Resurfacing for Swimming Pools</title>\n" "Home\n" "Home"; //close(fd); regex expr("Resurfacing(.|\n)*Home" , boost::regex::icase | boost::regex::perl); try { sregex_iterator itr(line.begin(), line.end(), expr, boost::match_not_dot_newline); sregex_iterator i; while(itr != i) { cout<
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users