[Boost-users] [Regex] "Truly" incremental search?

11 Feb 2008

      Hello,

The current "incremental search" code needs to resume search where the a 
possible match starts, and for certain expressions the worst-case time 
(i.e. input bytes come in one at a time) complexity becomes O(n^2) where 
n is the number of characters that match the expression:

Regex: omg.*wtf

Pass 1 input: o
Pass 1 input to matching: o
Pass 1 verdict: Partial, resume at offset 0.
Pass 2 input: m
Pass 2 input to matching: om
Pass 2 verdict: Partial, resume at offset 0.
Pass 3 input: g
Pass 3 input to matching: omg
Pass 3 verdict: Partial, resume at offset 0.

...

Pass n-1 input: t
Pass n-1 input to matching: omg...wt (length n-1)
Pass n-1 verdict: Partial, resume at offset 0.
Pass n input: f
Pass n input to matching: omg...wtf (length n)
Pass n verdict: Match found and starts at offset 0.

Total number of byte matches = n*(n+1)/2, i.e. O(n^2)

I am looking for a way to feed the input bytes only once, in order to 
reduce the time complexity to approximately O(n).  Of course I will lose 
the ability to recover submatches, but the ability is unnecessary for 
this particular application that I am considering.  I cannot break the 
regex into two parts ("omg" and "wtf") either because it is specified as 
an input to the program.

Eugene

[Boost-users] [Regex] "Truly" incremental search?

Eugene M. Kim