On 4/12/13 4:52 AM, Jan Strnad wrote:
Hi, I'm Jan Strnad, currently a Master's student at CTU in Prague, Czech Republic. This summer, I would like to participate in GSOC and I believe that approximate string matching is a good match for me, since my background i this area is quite strong.
Now more specifically. I would like to implement approximate string matching algorithm(s) based on the NFA. I would like to support various approximate distances, such as Hamming distance [1], Levenstein distance [2], Damerau-Levenstein distance [3], Delta distance, Gamma distance and finally (delta, gamma) distance. Sorry no explanation of the last three found on the Web, but I can provide one if interested.
In the matter of NFA implementation, I would probably choose dynamic programming approach, but I'm thinking of shift-or algorithm as well.
Can I ask you for any kind of thoughts or feedback regarding this topic?
Best regards, Jan Strnad
[1] http://en.wikipedia.org/wiki/Hamming_distance [2] http://en.wikipedia.org/wiki/Levenstein_distance [3] http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
Sounds very useful! I guess this would fit in the string algorithm library? What would the interface look like? Is it as simple as: namespace boost { namespace algorithm { template< typename Range1T, typename Range2T > std::size_t levenstein_distance( Range1T first, Range2T second ); } }