Dne 15.4.2013 15:11, Adam D. Walling napsal(a):
Jan Strnad
writes: Dne 12.4.2013 19:50, Michael Marcin napsal(a):
On 4/12/13 4:52 AM, Jan Strnad wrote:
...
Now more specifically. I would like to implement approximate string matching algorithm(s) based on the NFA. I would like to support various approximate distances, such as Hamming distance [1], Levenstein distance [2], Damerau-Levenstein distance [3], Delta distance, Gamma distance and finally (delta, gamma) distance. ...
Sounds very useful!
I guess this would fit in the string algorithm library?
That seems to me reasonable as well.
What would the interface look like?
Is it as simple as:
namespace boost { namespace algorithm {
template< typename Range1T, typename Range2T > std::size_t levenstein_distance( Range1T first, Range2T second );
} }
Sure, that would be possible, but I was originally aiming at something little bit different. Basically, I want to design an algorithm that would find all (or just the first) occurrences of a given pattern with given distance in some string.
So to your question, simple interface could look like this: namespace boost { namespace algorithm { template <typename T> class approximate_distance;
template <typename TRange> find_iterator<TRange> approximate_find (TRange text, approximate_distance<TRange>& model); } } ...
I would recommend starting off with the basic implementations of these distance algorithms if possible. There are several situations where I currently use my own Levenshtein distance implementation, and could use other alternative 'distance' measures, but the approximate_find is useless here.
Of course it could be useful in other situations, but I would suggest exposing the underlying distance algorithms first, so they can then be adapted into further utilities like approximate_find or used directly as necessary.
-Adam D. Walling
OK, I should definitely include these "word" distance functions as well. Jan Strnad