Re: [Boost-users] [Regex] "Truly" incremental search?

12 Feb 2008

      ...
The current "incremental search" code needs to resume search where the a
possible match starts, and for certain expressions the worst-case time
(i.e. input bytes come in one at a time) complexity becomes O(n^2) where
n is the number of characters that match the expression:
It's much worse than that: Perl style regular expressions are NP-Hard to 
match in general, and the worst case complexity is probably nearer to O(N!).
...
I am looking for a way to feed the input bytes only once, in order to
reduce the time complexity to approximately O(n).  Of course I will lose
the ability to recover submatches, but the ability is unnecessary for
this particular application that I am considering.  I cannot break the
regex into two parts ("omg" and "wtf") either because it is specified as
an input to the program.
That's not supported by Boost.Regex or as far as I know Boost.Xpressive: 
Perl style regex matchers need to be able to backtrack over the input in 
order to do their stuff.  In order to get O(N) behaviour you would need a 
DFA based regex engine, but would then loose:

* Marked subexpressions.
* Backreferences
* All the Perl extensions, non-greedy repeats, and (? constructs etc.

Can you not cache blocks of input rather than reading one character at a 
time (there's an example for searching large files that does that)?  Of 
course even then, it is in principal possible to devise a regex that 
requires backtracking over the *whole* of the text before it can determine 
whether there is a match or not :-(

Sorry for not helping much!

John.

Re: [Boost-users] [Regex] "Truly" incremental search?

John Maddock