[regex] Strange behavior with greediness operators?

All: I think I'm seeing strange behavior with the greediness operators in Boost.Regex (1.33.1, Mac OS 10.4.6), but was hoping someone could confirm that I'm using them correctly. For example, the following two calls: find_matches(".*a href=(.*?)>", "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>"); find_matches(".*a href=(.*)>", "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>"); Produce the same output: Expression: ".*a href=(.*?)>" Text: "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>" ** Match found ** Sub-Expressions: $0 = "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>" $1 = "testlink.html><br/>more text</body></html" Captures: $0 = { "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>" } $1 = { "testlink.html><br/>more text</body></html" } Expression: ".*a href=(.*)>" Text: "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>" ** Match found ** Sub-Expressions: $0 = "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>" $1 = "testlink.html><br/>more text</body></html" Captures: $0 = { "<html><head><title>test</title></head><body>this is a test<br/><a href=testlink.html><br/>more text</body></html>" } $1 = { "testlink.html><br/>more text</body></html" } It seems like the (.*?) expression should match only "testlink.html", as the ">" character terminates the pattern and would consume the least amount. (This same expression is working as written in Perl.) The find_matches function looks like: void find_matches(const std::string& regx, const std::string& text) { boost::regex e(regx); boost::smatch what; std::cout << "Expression: \"" << regx << "\"\n"; std::cout << "Text: \"" << text << "\"\n"; if(boost::regex_match(text, what, e, boost::match_extra | boost::match_partial)) { unsigned i, j; std::cout << "** Match found **\n Sub-Expressions:\n"; for(i = 0; i < what.size(); ++i) std::cout << " $" << i << " = \"" << what[i] << "\"\n"; std::cout << " Captures:\n"; for(i = 0; i < what.size(); ++i) { std::cout << " $" << i << " = {"; for(j = 0; j < what.captures(i).size(); ++j) { if(j) std::cout << ", "; else std::cout << " "; std::cout << "\"" << what.captures(i)[j] << "\""; } std::cout << " }\n"; } } else { std::cout << "** No Match found **\n"; } } Am I missing something in the usage, or is this a bug? Any guidance is appreciated. Thanks, Chris Hart

Christopher Hart wrote:
All:
I think I'm seeing strange behavior with the greediness operators in Boost.Regex (1.33.1, Mac OS 10.4.6), but was hoping someone could confirm that I'm using them correctly. For example, the following two calls:
No you're calling regex_match when you should be calling regex_search: regex_match will only find matches that match ALL OF THE INPUT STRING, if you want a subset of the input to match then you need to use regex_search. John.
participants (2)
-
Christopher Hart
-
John Maddock