[Regex] "Truly" incremental search?

11 Feb 2008

      Hello,

The current "incremental search" code needs to resume search where the a 
possible match starts, and for certain expressions the worst-case time 
(i.e. input bytes come in one at a time) complexity becomes O(n^2) where 
n is the number of characters that match the expression:

Regex: omg.*wtf

Pass 1 input: o
Pass 1 input to matching: o
Pass 1 verdict: Partial, resume at offset 0.
Pass 2 input: m
Pass 2 input to matching: om
Pass 2 verdict: Partial, resume at offset 0.
Pass 3 input: g
Pass 3 input to matching: omg
Pass 3 verdict: Partial, resume at offset 0.

...

Pass n-1 input: t
Pass n-1 input to matching: omg...wt (length n-1)
Pass n-1 verdict: Partial, resume at offset 0.
Pass n input: f
Pass n input to matching: omg...wtf (length n)
Pass n verdict: Match found and starts at offset 0.

Total number of byte matches = n*(n+1)/2, i.e. O(n^2)

I am looking for a way to feed the input bytes only once, in order to 
reduce the time complexity to approximately O(n).  Of course I will lose 
the ability to recover submatches, but the ability is unnecessary for 
this particular application that I am considering.  I cannot break the 
regex into two parts ("omg" and "wtf") either because it is specified as 
an input to the program.

Eugene

Eugene M. Kim

Renato golin

Eugene M. Kim

Renato Golin

Mike Marchywka

John Maddock

Eugene M. Kim

tags

participants (5)