
I read the article pointed out by Phil Endecott: An O(NP) Sequence Comparison Algorithm where P=D/2 when M=N and D is the actual edit distance, as said in the abstract: Let A and B be two sequences of length M and N respectively, where without loss of generality N > M, and let D be the length of a shortest edit script between them. A parameter related to D is the number of deletions in such a script, P = D/2 − (N −M)/2. We present an algorithm for finding a shortest edit distance of A and B whose worst case running time is O(NP) and whose expected running time is O(N + PD). The algorithm is simple and is very efficient whenever A is similar to a subsequence of B. It is nearly twice as fast as the O(ND) algorithm of Myers [9], and much more efficient when A and B differ substantially in length. the space required is O(M+N) the pseudocode is enougth simple and it's almost straightforward to translate it in C++ Algorithm Compare Begin fp[−(M+1)..(N+1)] := −1; p := −1; Repeat Begin p := p + 1; For k := −p to D−1 do fp[k] := snake( k, max (fp[k−1] + 1, fp[k+1]) ); For k := D + p downto D + 1 by −1 do fp[k] := snake( k, max (fp[k−1] + 1, fp[k+1]) ); fp[D] := snake( D, max (fp[D−1] + 1, fp[D+1]) ); End Until fp[D] = N; Write "The edit distance is: " D+2p; End Function snake( k, y: int) : int Begin x := y − k; While x < M and y < N and A[x+1] = B[y+1] do Begin x := x + 1; y := y + 1; End snake := y; End FWIW, the article provides same performance comparison against the Miller's algorithm (O(ND)), the one implemented by Phil I guess. number of edit number of comparisons execution time M N deletions distance O(NP) O(ND) O(NP) O(ND) 4000 5000 10 1020 21564 526506 0.13 2.67 4000 5000 50 1100 59520 614391 0.41 3.12 4000 5000 100 1200 121635 737748 0.83 4.02 4000 5000 200 1400 255157 1004952 1.62 5.87 4000 5000 400 1800 600216 1693377 3.68 8.94 4000 5000 600 2200 1016433 2523687 6.41 14.85 5000 5000 200 400 49202 93139 0.33 0.61 5000 5000 600 1200 398499 791815 2.34 4.65 In the attached file there is a draft but working implementation. this is a link to the article: http://www.cs.arizona.edu/~gene/PAPERS/np_diff.ps in reading it pay attention that the x-y coordinates are swapped and this is a bit annoying. Best Regards, Marco -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/