
The current "incremental search" code needs to resume search where the a possible match starts, and for certain expressions the worst-case time (i.e. input bytes come in one at a time) complexity becomes O(n^2) where n is the number of characters that match the expression:
It's much worse than that: Perl style regular expressions are NP-Hard to match in general, and the worst case complexity is probably nearer to O(N!).
I am looking for a way to feed the input bytes only once, in order to reduce the time complexity to approximately O(n). Of course I will lose the ability to recover submatches, but the ability is unnecessary for this particular application that I am considering. I cannot break the regex into two parts ("omg" and "wtf") either because it is specified as an input to the program.
That's not supported by Boost.Regex or as far as I know Boost.Xpressive: Perl style regex matchers need to be able to backtrack over the input in order to do their stuff. In order to get O(N) behaviour you would need a DFA based regex engine, but would then loose: * Marked subexpressions. * Backreferences * All the Perl extensions, non-greedy repeats, and (? constructs etc. Can you not cache blocks of input rather than reading one character at a time (there's an example for searching large files that does that)? Of course even then, it is in principal possible to devise a regex that requires backtracking over the *whole* of the text before it can determine whether there is a match or not :-( Sorry for not helping much! John.