----- Original Message -----
- It would be great to see a benchmark against GNU diff!
So I cooked up an implementation of unix diff: https://github.com/erikerlandson/algorithm/blob/edit_distance/sequence/examp... It produces standard default unix diff output: http://en.wikipedia.org/wiki/Diff#Usage That is to say, edit_distance_diff_example will reproduce the output of diff, at least if you just give both programs two files for arguments: $ edit_distance_diff_example foo.txt bar.txt > ed.out $ diff foo.txt bar.txt > diff.out $ diff ed.out diff.out $ I generated a few benchmarking data-sets, by editing /usr/share/dict/words. On my machine (F18), the 'words' file has 479,828 lines (one word per line) and is 4,953,680 bytes. So, a decent size of file in both line length and bytes. I did a couple different experiments, where I compared against a variation that differed by around 150 lines, and then another variation that differed by around 1200 lines. I also varied input size, by just duplicating the contents, so the result was twice the length and size. Across these variations of input size and file difference that I tested, edit_distance_diff_example ran consistently 40% to 60% faster than unix diff. So, roughly speaking it's performing about twice as fast. And less than 150 lines of code!