
Hum, Consider the following application (running on Linux with GCC): === BEGIN CODE #include <boost/regex.hpp> #include <iostream> #include <exception> #include <typeinfo> int main() { try { std::string rx_string = "^([[:alpha:]][-[:alnum:]]*[[:space:]]*)+$"; boost::regex rx(rx_string, boost::regex_constants::char_classes | boost::regex_constants::intervals); std::string test = "GlobalMSG HelpServ DevNull"; if (boost::regex_match(test, rx)) std::cout << "Matched." << std::endl; else std::cout << "Not Matched." << std::endl; } catch (const std::exception &e) { std::cout << "Exception " << typeid(e).name() << ": " << e.what() << std::endl; } return 0; } === END CODE The result of running this is: Exception N5boost14bad_expressionE: Memory exhausted My question is, why? The regex is simple enough, and the source string is quite short. Please note, the same regex with the test string of: "OperServ Magick-1" runs and completes successfully without a problem. Any ideas on how to fix this? -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

I suspect that [[:alphanum:]][-[:alphanum:]]* is what is killing you... I'm not even sure what you are trying to match... BTW: do you need to use capturing (ie parenthesis)? capturing can use lots of memory as well... Mathew ----- Original Message ----- From: "Preston A. Elder" <prez@neuromancy.net> To: <boost@lists.boost.org> Sent: Thursday, February 03, 2005 2:13 PM Subject: [boost] Simple regex problem
Hum,
Consider the following application (running on Linux with GCC):
=== BEGIN CODE #include <boost/regex.hpp> #include <iostream> #include <exception> #include <typeinfo>
int main() { try { std::string rx_string = "^([[:alpha:]][-[:alnum:]]*[[:space:]]*)+$"; boost::regex rx(rx_string, boost::regex_constants::char_classes | boost::regex_constants::intervals); std::string test = "GlobalMSG HelpServ DevNull";
if (boost::regex_match(test, rx)) std::cout << "Matched." << std::endl; else std::cout << "Not Matched." << std::endl; } catch (const std::exception &e) { std::cout << "Exception " << typeid(e).name() << ": " << e.what() << std::endl; }
return 0; } === END CODE
The result of running this is: Exception N5boost14bad_expressionE: Memory exhausted
My question is, why? The regex is simple enough, and the source string is quite short. Please note, the same regex with the test string of: "OperServ Magick-1"
runs and completes successfully without a problem.
Any ideas on how to fix this?
-- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 03 Feb 2005 14:40:20 +1100, Mathew Robertson wrote:
I suspect that [[:alphanum:]][-[:alphanum:]]* Uhh, the docs say to use [[:alnum:]], see: http://www.boost.org/libs/regex/doc/syntax.html (scroll down to 'sets').
is what is killing you... I'm not even sure what you are trying to match... The test string was in the application.
What I'm trying to match are words, separated by spaces, where the first character can be alphanumeric, and any subsequent character can be alphanumeric or a hyphen.
BTW: do you need to use capturing (ie parenthesis)? capturing can use lots of memory as well... Obviously, I do if I want to match multiple words, and ensure that the entire string completely matches the regex (no matter if it matches once or 100 times).
Trust me, the regex works. I changed my flags to boost::regex_constants::normal, and it all worked just fine, however it does not explain why when I was using just the flags boost::regex_constants::char_classes | boost::regex_constants::intervals, it refused to match more than two words. Of course, I could just keep the flags at normal, however I want to be minimalistic about flags, and as I said, what I had worked, but only for two words before running out of memory, aparently - which is odd. -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

Trust me, the regex works. I changed my flags to boost::regex_constants::normal, and it all worked just fine, however it does not explain why when I was using just the flags boost::regex_constants::char_classes | boost::regex_constants::intervals, it refused to match more than two words.
The problem is that you need to specify what kind of regular expression it is (basic, extended, Perl etc), if you don't specify anything it defaults to something like POSIX-Basic semantics, which leads to leftmost longest matching being selected. Now onto your expression, the problem here is the [[:space:]]* part: because this can match zero times, your expression could be reduced down to something equivalent to: "([[:alnum:]]+)+" *in the worst case*, and this is the classic "may take forever to match example". It works when you use Perl matching semantics, because the matcher stops as soon as a match is found, if no match is found, then it may well thrash indefinitely (leading to an exception eventually). When POSIX matching semantics are selected, the leftmost longest rule causes the matcher to thrash looking for the "best" possible match, again leading to the exception. So to conclude: specify that you want Perl-style regexes (unless you really want POSIX leftmost longest rules). And, change your expression to something like: "^([[:alpha:]][-[:alnum:]]*(?:[[:space:]]+|$))+$" Hope this helps, John.
participants (3)
-
John Maddock
-
Mathew Robertson
-
Preston A. Elder