
----- Original Message ----- From: "Yu Jiang" <sunlightt07@gmail.com> To: boost@lists.boost.org Sent: Tuesday, April 23, 2013 1:26:35 AM Subject: [boost] [gsoc-2013]Approximate string matching I have seen in a previous discussion, another student proposed to implement some basic matching functions such as "edit distance" for boost. He said by using dynamic programming, the time complexity is O(m*n). However, if we give a threshold T and only report the exact edit distance if it's no larger than T, the complexity can be O(T * min(m,n)). I think sometimes we just need to decide whether the edit distance is small enough. .... So, is it compatible for gsoc 2013 and for boost? Looking forward to your opinions. I can't speak to gsoc 2013, but as a user of edit distance algorithms in previous lives, I know I'd be interested in a boost edit-distance library. How one would design the interface for edit_distance<> is an interesting question. In the basic general case, edit_distance<> is a function of two sequences, and a cost matrix for insertion/deletion/substitution/equality. In boost/template world, the "equality" test would of course be a template parameter, with the expected default. These days, I'd expect to be able to provide sequences as either begin/end iterators, or ranges. The cost matrix could be either provided as numeric values or functionals, with defaults to: 1/1/1/0. I would want to support a variation that limited the search to a diagonal band, either as a separate function or via some parameter. I imagine that it would be nice to provide specialization for strings, and perhaps vectors, since those are common sequence inputs and it would be convenient to supply them directly.