
Bob Kline wrote:
We have encountered a problem in the regex++ package, which reports having exhausted memory after examining a very short string with a regular expression of only modest complexity. I realize that the documentation for the package does not specify how much memory usage is too much, but since the same combination of regular expression and test string works without problems with a number of other programming toolsets (e.g., Python, Perl, C#) I'm guessing that the maintainers of the package would be interested in tracking down the problem (I would if it were my software). Here's a repro case which boils the problem down to the tiniest example:
This is probably not what you want to hear, but the behaviour is by design: Perl-regexes are in the worst case NP-Hard problems to solve, so there is a built in state-count that tracks how many states in the machine are visited and throws an exception if the number of states visited looks too large compared to the size of the string. The aim is to weed out "bad" regexes that may cause problems with certain texts and not others before they can do too much damage. Obviously this is a heuristic that sometimes throws too early or too late. When this issue occurs it's almost always a problem with the expression being ambiguous: each time a repeat or other decision has to be made then either alternative can be taken, so if no match is found the regex state machine has to thrash around trying ever possible combination. Is there any reason why you can't simplify your expression to just "^[^\\s]+( ([^\\s]+)*)?$" which does the same thing much more efficiently? HTH, John.