
"John Maddock" <john@johnmaddock.co.uk> wrote in message news:014401c76c69$ef3f56e0$877c1f56@fuji...
Michael Goldshteyn wrote:
... this actually does highlight an apparent bug/deficiency. A much simpler example is given below:
--- boost::regex test_regex("(?:.[a-z0-9]+)+");
boost::regex_match("www.appwebsite1.mydomain.com/1/1/", test_regex); ---
This throws the same exception, at least in a VC 8.0 build using the DLL version of the reg-ex lib (debug), in perl_matcher_non_recursive.hpp at line 164 (raise_error(traits_inst, regex_constants::error_space);), because the state_count is 107429 and the max_state_count is 107425.
Right, the behaviour is deliberate and by design: it tries to protect you from pathological regexes that may take "forever" to try and match. Obviously the logic is heuristic based and therefore imperfect, but in this case there is still a bug in your regex: I assume you meant to use "(?:\\.[a-z0-9]+)+" to match a literal "." rather than the regex meaning of "any character". This would have been OK.
The reason your example is pathological is because given a long string of characters, then for each character the state machine can either take the inner repeat or the outer repeat and the number of states visited grows exponentially with the number of characters in the string: the expression is effectively the same as (.+)+ which is the classic patholgical test case for backtracking regexes.
HTH, John.
I understand the issues, now, but am wondering whether control over the "maximum number of states" should be given to the caller or if the issue is one of running out of memory? Michael Goldshteyn