
On Tue, Feb 12, 2008 at 08:57:05AM -0000, John Maddock wrote:
It's much worse than that: Perl style regular expressions are NP-Hard to match in general, and the worst case complexity is probably nearer to O(N!).
...
That's not supported by Boost.Regex or as far as I know Boost.Xpressive: Perl style regex matchers need to be able to backtrack over the input in order to do their stuff. In order to get O(N) behaviour you would need a DFA based regex engine, but would then loose:
* Marked subexpressions. * Backreferences * All the Perl extensions, non-greedy repeats, and (? constructs etc.
Can you not cache blocks of input rather than reading one character at a time (there's an example for searching large files that does that)? Of course even then, it is in principal possible to devise a regex that requires backtracking over the *whole* of the text before it can determine whether there is a match or not :-(
Sorry for not helping much!
Not at all, because what you said is actually good to know... Since all I need to support is just the POSIX ERE and not the Perl-style RE, I should probably just use the pcre_dfa_exec() of the PCRE library instead; its documentation specifically states that the algorithm does not backtrack. (Funny, I have to use the PCRE library in order /not/ to use the PCRE syntax. :-p) Thank you, Eugene