Regex failure

5 Feb 2007

      We have encountered a problem in the regex++ package, which reports
having exhausted memory after examining a very short string with a
regular expression of only modest complexity.  I realize that the
documentation for the package does not specify how much memory usage is
too much, but since the same combination of regular expression and test
string works without problems with a number of other programming
toolsets (e.g., Python, Perl, C#) I'm guessing that the maintainers of
the package would be interested in tracking down the problem (I would if
it were my software).  Here's a repro case which boils the problem down
to the tiniest example:

#include <boost/regex.hpp>
int main() {
    boost::wregex  e(L"^[^\\s]( ?([^\\s]+ ?)*[^\\s])?$");
    boost::wcmatch m;
    boost::regex_match(L"codeine phosphate ", m, e);
    return 0;
}

I have confirmed that the behavior is present in the most recent version
of the Boost code by retrieving and building the latest set of sources
from CVS this morning.  (I understand the usefulness of having the user
perform this check, but making this a requirement, as the web
instructions for submitting bug reports do, may be eliminating a
substantial number of valuable reports; it took a number of attempts,
with the connection to the CVS server hanging several times, before I
could even get to the very lengthy build step.)

The failure is not triggered if a version of regex_match is used which
does not take the match_results argument, but then of course we don't
get access to the match results.

I'm pretty sure that the expression is boiled down to the least
ambiguous form (without changing the semantics).  In plain English, it's
looking for strings that have no leading or trailing whitespace, and for
which any internal whitespace runs are comprised solely of a single
blank character.  Doesn't seem like a very esoteric pattern.

We have reproduced the behavior on Linux and on Windows.

Hope this is useful.  Feel free to contact me if you need any further
information.

Bob Kline

Bob Kline

Peter Dimov

John Maddock

Nat Goodspeed

John Maddock

Bob Kline

tags

participants (4)