Erik Erlandson
- I think it should be possible to relax the requirement for random access iterators. Either continue to store integers and then use std::advance, or store iterators. I'm considering, for example, comparing UTF-8 with an iterator adaptor than changes a random-access iterator over the underlying bytes into a bidirectional iterator over the characters.
I briefly toyed with this idea earlier on, but it seemed (imo) like it would undermine the virtues of the algorithm. I guess what I mean by that is, Myers does stuff like storing only one index in the working vector 'V' for memory efficiency, and so we have "j2 = j1-k", which either requires index manipulation, or if you wanted to do it via iterators, it is only efficient for random-access iterators.
OK, I've had a proper look at this. If sizeof(iterator) == sizeof(size_t), then I think you have to store a pair of iterators which doubles the memory requirements. I was hoping to find that you could continue to store a single integer and use std::advance and keep O(ND), but I think it probably becomes O(N D^2). So probably not worth doing.
- Some might prefer a version that doesn't use Boost.Parameter.
I'm not sure how that would look: would it just take all 8 parameters all the time?
Yes, with defaults.
- It would be great to see a benchmark against GNU diff!
Also a good idea, although my impression of that code (which I poked around while doing this) is that it isn't factored in a way that would make dropping in a replacement very practical. I suppose the other option is to write a version of diff using edit_distance
Yes, that's exactly the sort of thing I had in mind; something that would work like diff, though without most of its myriad options. You could use this as a tutorial, as suggested in another post. Another tutorial idea is an approximate match / spell check replacement word suggester, i.e. scan /usr/dict/words and output words closest to the input by edit distance. You could use the max_cost feature. Regards, Phil.