[regex] "Invalid content of repeat range" when using greediness and capture operators?

I've run into a problem where I'm getting an "Invalid content of repeat range" when attempting to match a string using greediness and capture operators. I'm using the following function (basically taken from the "print captures" example in the regex docs): void find_matches(const std::string& regx, const std::string& text) { boost::regex e(regx, boost::regex::perl); boost::smatch what; std::cout << "Expression: \"" << regx << "\"\n"; std::cout << "Text: \"" << text << "\"\n"; if(boost::regex_match(text, what, e)) //, boost::match_extra)) { ... [snip] ... } else { ... } } ... invoked by: find_matches(html, "
<a href=(.*?)>"); The pattern I'm trying to match is working in Perl program (e.g. if ($html =~ m|<p class="priceBest"><a href=(.*?)>|gs) {...} ), so it seems like the pattern should be syntactically correct. I'm building on Mac OS X 10.4 (Intel) with Boost 1.33.1. I'm new to Boost and probably spoiled by Perl pattern matching, so I'm guessing I'm missing something simple. Any help or suggestions are appreciated! Thanks, Chris Hart

On 5/21/06, Christopher Hart
void find_matches(const std::string& regx, const std::string& text) <snip> ... invoked by:
find_matches(html, "
<a href=(.*?)>");
The arguments here do not seem to agree with the formal parameter names. Did you inadvertently reverse them? I'm also wondering about that ".*?"; my limited experience with regexing has taught me never to follow a . with a *. But I'll leave further discussion on that to someone who has used boost::regex. Dale

Christopher Hart wrote:
I've run into a problem where I'm getting an "Invalid content of repeat range" when attempting to match a string using greediness and capture operators. I'm using the following function (basically taken from the "print captures" example in the regex docs):
find_matches(html, "
<a href=(.*?)>");
Your arguments are the wrong way around there, the expression works just fine for me otherwise.
The pattern I'm trying to match is working in Perl program (e.g. if ($html =~ m|<p class="priceBest"><a href=(.*?)>|gs) {...} ), so it seems like the pattern should be syntactically correct. I'm building on Mac OS X 10.4 (Intel) with Boost 1.33.1.
You might want to make the expression case insensitive in case the data has <P> rather than <p>, and insert a \s* in between the <p> and the <a> just in case.... Just a couple of random thoughts, John.

Wow, what a simple, silly mistake. Once corrected, it works fine.
Thanks for the suggestions on case insensitivity and spacing, too.
Now I'm off to get iterators working!
Thanks,
Chris
On 5/22/06, John Maddock
Christopher Hart wrote:
I've run into a problem where I'm getting an "Invalid content of repeat range" when attempting to match a string using greediness and capture operators. I'm using the following function (basically taken from the "print captures" example in the regex docs):
find_matches(html, "
<a href=(.*?)>");
Your arguments are the wrong way around there, the expression works just fine for me otherwise.
The pattern I'm trying to match is working in Perl program (e.g. if ($html =~ m|<p class="priceBest"><a href=(.*?)>|gs) {...} ), so it seems like the pattern should be syntactically correct. I'm building on Mac OS X 10.4 (Intel) with Boost 1.33.1.
You might want to make the expression case insensitive in case the data has <P> rather than <p>, and insert a \s* in between the <p> and the <a> just in case....
Just a couple of random thoughts,
John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
Christopher Hart
-
Dale McCoy
-
John Maddock