From hartct@gmail.com Sun May 21 15:57:35 2006 From: Christopher Hart To: boost-users@lists.preview.boost.org Subject: [Boost-users] [regex] "Invalid content of repeat range" when using greediness and capture operators? Date: Sun, 21 May 2006 15:57:32 -0400 Message-ID: <8212c19d0605211257i47278f39q5601bc1c134770e9@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1733903395368368298==" --===============1733903395368368298== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit I've run into a problem where I'm getting an "Invalid content of repeat range" when attempting to match a string using greediness and capture operators. I'm using the following function (basically taken from the "print captures" example in the regex docs): void find_matches(const std::string& regx, const std::string& text) { boost::regex e(regx, boost::regex::perl); boost::smatch what; std::cout << "Expression: \"" << regx << "\"\n"; std::cout << "Text: \"" << text << "\"\n"; if(boost::regex_match(text, what, e)) //, boost::match_extra)) { ... [snip] ... } else { ... } } ... invoked by: find_matches(html, "

"); The pattern I'm trying to match is working in Perl program (e.g. if ($html =~ m|

|gs) {...} ), so it seems like the pattern should be syntactically correct. I'm building on Mac OS X 10.4 (Intel) with Boost 1.33.1. I'm new to Boost and probably spoiled by Perl pattern matching, so I'm guessing I'm missing something simple. Any help or suggestions are appreciated! Thanks, Chris Hart --===============1733903395368368298==-- From dalestan@gmail.com Sun May 21 21:06:42 2006 From: Dale McCoy To: boost-users@lists.preview.boost.org Subject: Re: [Boost-users] [regex] "Invalid content of repeat range" when using greediness and capture operators? Date: Sun, 21 May 2006 21:06:39 -0400 Message-ID: <1e3fa2f50605211806q3a6abae6rf85a6cb8349f2c0e@mail.gmail.com> In-Reply-To: <8212c19d0605211257i47278f39q5601bc1c134770e9@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1905593125941760509==" --===============1905593125941760509== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On 5/21/06, Christopher Hart wrote: > void find_matches(const std::string& regx, const std::string& text) > > ... invoked by: > > find_matches(html, "

"); The arguments here do not seem to agree with the formal parameter names. Did you inadvertently reverse them? I'm also wondering about that ".*?"; my limited experience with regexing has taught me never to follow a . with a *. But I'll leave further discussion on that to someone who has used boost::regex. Dale --===============1905593125941760509==-- From john@johnmaddock.co.uk Mon May 22 05:27:29 2006 From: John Maddock To: boost-users@lists.preview.boost.org Subject: Re: [Boost-users] [regex] "Invalid content of repeat range" when usinggreediness and capture operators? Date: Mon, 22 May 2006 10:27:13 +0100 Message-ID: <014f01c67d81$ee980070$5e300252@fuji> In-Reply-To: <8212c19d0605211257i47278f39q5601bc1c134770e9@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3356480456089927211==" --===============3356480456089927211== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Christopher Hart wrote: > I've run into a problem where I'm getting an "Invalid content of > repeat range" when attempting to match a string using greediness and > capture operators. I'm using the following function (basically taken > from the "print captures" example in the regex docs): > find_matches(html, "

"); Your arguments are the wrong way around there, the expression works just fine for me otherwise. > The pattern I'm trying to match is working in Perl program (e.g. if > ($html =~ m|

|gs) {...} ), so it > seems like the pattern should be syntactically correct. I'm building > on Mac OS X 10.4 (Intel) with Boost 1.33.1. You might want to make the expression case insensitive in case the data has

rather than

, and insert a \s* in between the

and the just in case.... Just a couple of random thoughts, John. --===============3356480456089927211==-- From hartct@gmail.com Mon May 22 07:54:59 2006 From: Christopher Hart To: boost-users@lists.preview.boost.org Subject: Re: [Boost-users] [regex] "Invalid content of repeat range" when usinggreediness and capture operators? Date: Mon, 22 May 2006 07:54:57 -0400 Message-ID: <8212c19d0605220454j3683becfle1555db6f6c405d2@mail.gmail.com> In-Reply-To: <014f01c67d81$ee980070$5e300252@fuji> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0837704424780569520==" --===============0837704424780569520== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Wow, what a simple, silly mistake. Once corrected, it works fine. Thanks for the suggestions on case insensitivity and spacing, too. Now I'm off to get iterators working! Thanks, Chris On 5/22/06, John Maddock wrote: > Christopher Hart wrote: > > I've run into a problem where I'm getting an "Invalid content of > > repeat range" when attempting to match a string using greediness and > > capture operators. I'm using the following function (basically taken > > from the "print captures" example in the regex docs): > > > find_matches(html, "

"); > > Your arguments are the wrong way around there, the expression works just > fine for me otherwise. > > > The pattern I'm trying to match is working in Perl program (e.g. if > > ($html =~ m|

|gs) {...} ), so it > > seems like the pattern should be syntactically correct. I'm building > > on Mac OS X 10.4 (Intel) with Boost 1.33.1. > > You might want to make the expression case insensitive in case the data has >

rather than

, and insert a \s* in between the

and the just in > case.... > > Just a couple of random thoughts, > > John. > > _______________________________________________ > Boost-users mailing list > Boost-users@lists.boost.org > http://lists.boost.org/mailman/listinfo.cgi/boost-users > --===============0837704424780569520==--