2013/4/28 Erik Erlandson
I agree to modify the return type for function (4). So now I think your suggestion for using a class to supply these 3 functions together is a better choice. In case (1)(2)(3), I think return size_type seems to be better since the algorithm itself can only output an integer as the result?
I think in most cases the result would be integer. I did once implement a variation where I used (1.5) for the cost of substitution, and (1.0) for insertion/deletion. I no longer remember the details, but I do recall that using (2.0) didn't work as well, it wanted to be (1.5). At any rate, the resulting distance in such a case would be floating point, not integer. If it's reasonably easy to support configurable compute/return types, including non-integers, I think it might prove useful.
Yes, I totally agree with you. In case (1)(2)(3), there isn't a parameter for user to change the default cost(1/1/1/0). So maybe return an integer value is enough. In case (4), the user can change the return type if he likes.
I know that you are going to be focusing on actual strings for your applications, but there are other kinds of edit distance application, for example a use case like the unix diff command, where you are doing edit-distance-like computations on two sequences of lines of text, and in fact the edit cost between any two lines may itself be computed as an edit distance. So I think designing the routines to be as general as possible (within reason) will pay dividends for the boost community, and improve their adoption.
I may not want to focus on actual strings for Part A. similarity functions, since here actual strings and a container are similar to deal with. I also think it's easy for me to compute the sequence for the actions.
Maybe the new interface can be like this:
Part A, similarity metrics(use edit distance as an example):
template....
(1) (size_type or unsigned) edit_distance(const Range1T&, const Range2T&);
(2) (size_type or unsigned) edit_distance(const Range1T&, const Range2T&,
(size_type or unsigned) threshold);
(3) (size_type or unsigned) edit_distance(const Range1T&, const Range2T&,
equal_func);
(4) (user-defined type) edit_distance(const Range1T&, const Range2T&, cost);
the cost struct may be like this:
template
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost