Re: [Boost-users] Regex failure

6 Feb 2007

      Bob Kline wrote:
...
We have encountered a problem in the regex++ package, which reports
having exhausted memory after examining a very short string with a
regular expression of only modest complexity.  I realize that the
documentation for the package does not specify how much memory usage
is too much, but since the same combination of regular expression and
test string works without problems with a number of other programming
toolsets (e.g., Python, Perl, C#) I'm guessing that the maintainers of
the package would be interested in tracking down the problem (I would
if it were my software).  Here's a repro case which boils the problem
down to the tiniest example:
This is probably not what you want to hear, but the behaviour is by design: 
Perl-regexes are in the worst case NP-Hard problems to solve, so there is a 
built in state-count that tracks how many states in the machine are visited 
and throws an exception if the number of states visited looks too large 
compared to the size of the string.  The aim is to weed out "bad" regexes 
that may cause problems with certain texts and not others before they can do 
too much damage.  Obviously this is a heuristic that sometimes throws too 
early or too late.

When this issue occurs it's almost always a problem with the expression 
being ambiguous: each time a repeat or other decision has to be made then 
either alternative can be taken, so if no match is found the regex state 
machine has to thrash around trying ever possible combination.

Is there any reason why you can't simplify your expression to just 
"^[^\\s]+( ([^\\s]+)*)?$" which does the same thing much more efficiently?

HTH, John.

Re: [Boost-users] Regex failure

John Maddock