----- Original Message -----
From: "Yu Jiang"
To: boost@lists.boost.org
Sent: Tuesday, April 23, 2013 1:26:35 AM
Subject: [boost] [gsoc-2013]Approximate string matching
I have seen in a previous discussion, another student proposed to implement
some basic matching functions such as "edit distance" for boost. He said by
using dynamic programming, the time complexity is O(m*n). However, if we
give a threshold T and only report the exact edit distance if it's no
larger than T, the complexity can be O(T * min(m,n)). I think sometimes we
just need to decide whether the edit distance is small enough.
....
So, is it compatible for gsoc 2013 and for boost? Looking forward to your
opinions.
I can't speak to gsoc 2013, but as a user of edit distance algorithms in previous lives, I know I'd be interested in a boost edit-distance library.
How one would design the interface for edit_distance<> is an interesting question. In the basic general case, edit_distance<> is a function of two sequences, and a cost matrix for insertion/deletion/substitution/equality. In boost/template world, the "equality" test would of course be a template parameter, with the expected default. These days, I'd expect to be able to provide sequences as either begin/end iterators, or ranges. The cost matrix could be either provided as numeric values or functionals, with defaults to: 1/1/1/0.
I would want to support a variation that limited the search to a diagonal band, either as a separate function or via some parameter.
I imagine that it would be nice to provide specialization for strings, and perhaps vectors, since those are common sequence inputs and it would be convenient to supply them directly.