[math] RFC: comparison function objects

Hi Everybody, in a discussion on comp.std.c++ it was proposed to include into boost some facility to "fuzzy" or, in other words, to "properly" compare floating point numbers. Something like this, for example: template <typename T> struct compare_absolute_error : std::binary_function<T, T, bool> { T eps; compare_absolute_error(T eps_) : eps(eps_) {} bool operator()(T x, T y) const { return abs(x - y) < eps; } }; This kind of operators occurs frequently in algorithms and are sometimes error-prone to write, so having a ready-made and well tested component can certainly be an advantage. Of course, we could (and should) provide a bunch of different comparison algorithms: absolute and relative in the first place, but there may be others. I believe it might be a little but useful addition to the math library. Main questions are: 1) Is there interest for this? 2) What are the comparison algorithms to include? Thanks for your comments. Ganesh

template <typename T> struct compare_absolute_error : std::binary_function<T, T, bool> { T eps; compare_absolute_error(T eps_) : eps(eps_) {} bool operator()(T x, T y) const { return abs(x - y) < eps; } };
This kind of operators occurs frequently in algorithms and are sometimes error-prone to write, so having a ready-made and well tested component can certainly be an advantage.
Of course, we could (and should) provide a bunch of different comparison algorithms: absolute and relative in the first place, but there may be others. I believe it might be a little but useful addition to the math library.
Main questions are:
1) Is there interest for this?
Yes, please.
2) What are the comparison algorithms to include?
I've already made a start, well not really a start, just a "I needed this functionality to move on" throw it together sort of start here: http://freespace.virgin.net/boost.regex/toolkit/libs/math/doc/html/math_tool... Relative error in particular is a tricky one, you have to deal with: One or both values may be zero. One or both values may be denorms. One or both values may be infinities. So I guess if you wanted a really super-duper handle it all version you would make it policy based: * Values below a certain threshold are regarded as zero (so all denorms are equivalent for example). * Values above a threshold are regarded as effectively infinity. * Absolute errors are used if the values are small enough. Which of these you actually want to use in your application depend on the QOI of the component you're testing I guess. Personally I treat all denorms as zeros because if the result is a denorm there probably isn't enough information content throughout the calculation that led to the denorm to get an accurate value (if you see what I mean). Anyway, just my 2c worth, please do feel free to jump in with something better if you want! John.

John Maddock ha scritto:
2) What are the comparison algorithms to include?
I've already made a start, well not really a start, just a "I needed this functionality to move on" throw it together sort of start here: http://freespace.virgin.net/boost.regex/toolkit/libs/math/doc/html/math_tool...
Thanks a lot for the link. Very interesting.
Relative error in particular is a tricky one, you have to deal with:
One or both values may be zero. One or both values may be denorms. One or both values may be infinities.
Sure. The advantage to have a library solution is that it can handle corner cases better than a naive implementation.
So I guess if you wanted a really super-duper handle it all version you would make it policy based:
I agree that we should provide as much customization as possible, but not more. In particular, I would use policy classes only if necessary, preferring multiple functors instead of one single functor with multiple policies. The task is perceived by a lot of programmer as trivial (although it isn't) and if the class is too complicated people might get discouraged and won't use it.
* Values below a certain threshold are regarded as zero (so all denorms are equivalent for example).
Good point.
* Values above a threshold are regarded as effectively infinity.
Is this necessary? I mean, that might be useful if we want to really compute a meaningful estimate of the relative error, but in this case we simply want to check if the relative error is smaller than a (hopefully small) value.
* Absolute errors are used if the values are small enough.
That might be a different algorithm. I would keep both relative-error and relative-error-or-absolute-error-if-values-small-enough. They both sounds right in different use cases.
Which of these you actually want to use in your application depend on the QOI of the component you're testing I guess.
Personally I treat all denorms as zeros because if the result is a denorm there probably isn't enough information content throughout the calculation that led to the denorm to get an accurate value (if you see what I mean).
I think I understand and I partially agree. However it's not the fact that denormals are probably already inaccurate that worries me (the library should not make assumptions on how the data has been obtained) but rather that the division by such a small number might produce an inaccurate, and therefore useless, result.
Anyway, just my 2c worth, please do feel free to jump in with something better if you want!
Thanks for your feedback, Ganesh

Sure. The advantage to have a library solution is that it can handle corner cases better than a naive implementation.
Yep, this is something lots of people write believing it to be trivial and then end up fixing all the corner cases.
So I guess if you wanted a really super-duper handle it all version you would make it policy based:
I agree that we should provide as much customization as possible, but not more. In particular, I would use policy classes only if necessary, preferring multiple functors instead of one single functor with multiple policies. The task is perceived by a lot of programmer as trivial (although it isn't) and if the class is too complicated people might get discouraged and won't use it.
True.
* Values above a threshold are regarded as effectively infinity.
Is this necessary? I mean, that might be useful if we want to really compute a meaningful estimate of the relative error, but in this case we simply want to check if the relative error is smaller than a (hopefully small) value.
Actually probably not :-)
* Absolute errors are used if the values are small enough.
That might be a different algorithm. I would keep both relative-error and relative-error-or-absolute-error-if-values-small-enough. They both sounds right in different use cases.
Yep, and in fact once you have the basics sorted out it's trivial to create a thin wrapper that implements a policy like that if you need it.
Which of these you actually want to use in your application depend on the QOI of the component you're testing I guess.
Personally I treat all denorms as zeros because if the result is a denorm there probably isn't enough information content throughout the calculation that led to the denorm to get an accurate value (if you see what I mean).
I think I understand and I partially agree. However it's not the fact that denormals are probably already inaccurate that worries me (the library should not make assumptions on how the data has been obtained) but rather that the division by such a small number might produce an inaccurate, and therefore useless, result.
Yep, good point. BTW I notice in other messages that folks have been talking about a function object for this, but.... what use would that be for? At present I can't see anything simpler and easier to use than just a free function: template <class T> T relative_error(T, T); Regards, John.

John Maddock ha scritto:
BTW I notice in other messages that folks have been talking about a function > object for this, but.... what use would that be for? At present I can't see anything simpler and easier to use than just a free function:
template <class T> T relative_error(T, T);
Interesting. The other posters are talking about function objects because I started the discussion with them. The main reasons I did that is that function objects are more suitable to be passed as arguments to algorithms because the compiler can inline them (more) easily. Having free functions could be a nice addition but they shouldn't replace function objects entirely, IMHO. BTW, there is a subtle but important fact to consider: a function like relative_error() is usually understood by mathematicians as a tool to compute the error of an estimated value w.r.t. an expected value. In particular, the two parameters have different roles and the algorithm is asymmetric: template <class T> T relative_error(T value, T estimate) { return abs((value - estimate) / value); } // asymmetric! (see, for example http://en.wikipedia.org/wiki/Relative_error) However our goal is to determine whether two values are simply near to each other. A programmer therefore could (and should) expect the algorithm to be symmetric in the two parameters. Of course, we could please both worlds by providing relative_error as defined above, plus this: template <class T> T relative_error_symmetric(T x, T y) { return std::max(relative_error(x, y), relative_error(y, x)); } and then defining the comparison function object in terms of the latter. What do you think? Ganesh

BTW I notice in other messages that folks have been talking about a function > object for this, but.... what use would that be for? At present I can't see anything simpler and easier to use than just a free function:
template <class T> T relative_error(T, T);
Interesting. The other posters are talking about function objects because I started the discussion with them. The main reasons I did that is that function objects are more suitable to be passed as arguments to algorithms because the compiler can inline them (more) easily. Having free functions could be a nice addition but they shouldn't replace function objects entirely, IMHO.
I realise why they're used, but I'm struggling to think of an example where one would pass a function object that returns relative error to another function: in other words I'm looking for your motivation.
BTW, there is a subtle but important fact to consider: a function like relative_error() is usually understood by mathematicians as a tool to compute the error of an estimated value w.r.t. an expected value. In particular, the two parameters have different roles and the algorithm is asymmetric:
Yep, I realise that.
However our goal is to determine whether two values are simply near to each other. A programmer therefore could (and should) expect the algorithm to be symmetric in the two parameters.
Of course, we could please both worlds by providing relative_error as defined above, plus this:
template <class T> T relative_error_symmetric(T x, T y) { return std::max(relative_error(x, y), relative_error(y, x)); }
and then defining the comparison function object in terms of the latter. What do you think?
Um, the problem I have with the asymmetric version, is this: which parameter is the "true" value? Or to put it another way, it's way too easy to screw up and pass the parameters the wrong way around, and no amount of testing you do will ever detect that mistake (especially if you're getting spurious test passes as a result of your mistake, unlikely I realise but possible). That's why I always test my code with a symmetrical version, just called "relative_error". It's not quite what a mathematician would expect, but from a programming perspective it's more robust. Just my 2c again. John.

John Maddock ha scritto:
BTW I notice in other messages that folks have been talking about a function > object for this, but.... what use would that be for? At present I can't see anything simpler and easier to use than just a free function:
template <class T> T relative_error(T, T);
Interesting. The other posters are talking about function objects because I started the discussion with them. The main reasons I did that is that function objects are more suitable to be passed as arguments to algorithms because the compiler can inline them (more) easily. Having free functions could be a nice addition but they shouldn't replace function objects entirely, IMHO.
I realise why they're used, but I'm struggling to think of an example where one would pass a function object that returns relative error to another function: in other words I'm looking for your motivation.
You may have the same generic algorithm (written as a template, probably) that you may want to use with different methods of computing errors, for example. For instance: double total_absolute_error = std::inner_product(values.begin(), values.end(), estimates.begin(), std::plus<double>(), absolute_error<double>()); double total_relative_error = std::inner_product(values.begin(), values.end(), estimates.begin(), std::plus<double>(), relative_error<double>()); Anyway the whole point is about comparison, not calculation of the relative error. The comparison strategy is what is most probably going to be passed to algorithms so it's more suitably implemented through function objects. I hope you agree up to this point. Well, now I see the relative_error computation as a policy that customizes a comparison object and also in this case it's more suitably implemented through an object. But enough words, I have uploaded a first implementation of what I have in mind in the boost vault, so you can see what I mean. You will find it in the "math - numerics" folder.
Um, the problem I have with the asymmetric version, is this: which parameter is the "true" value?
The first one ;) Jokes apart, if the programmer knows which is the "true" value (it might happen, really!) he can pass it as the first parameter and use the asymmetric version, otherwise he'll better use the symmetric version.
Or to put it another way, it's way too easy to screw up and pass the parameters the wrong way around, and no amount of testing you do will ever detect that mistake (especially if you're getting spurious test passes as a result of your mistake, unlikely I realise but possible). That's why I always test my code with a symmetrical version, just called "relative_error". It's not quite what a mathematician would expect, but from a programming perspective it's more robust.
I know that most programmers will use the symmetric version, but that's not a reason for not providing the asymmetric. The only thing is to choose good names so that everyone is happy and (most importantly) everyone knows what's he doing. Ganesh

Um, the problem I have with the asymmetric version, is this: which parameter is the "true" value?
The first one ;)
OK, the one time I tried things that way, I made it the second parameter :-)
Jokes apart, if the programmer knows which is the "true" value (it might happen, really!)
Sure, it happens all the time if you're testing a special function for accuracy :-)
I know that most programmers will use the symmetric version, but that's not a reason for not providing the asymmetric. The only thing is to choose good names so that everyone is happy and (most importantly) everyone knows what's he doing.
OK, how about relative_error and relative_difference ? The latter is obviously intended as the symmetric version, but if that wasn't obvious it's not a good name :-) John.

Alberto Ganesh Barbati wrote:
template <typename T> struct compare_absolute_error : std::binary_function<T, T, bool>
bool operator()(T x, T y) const { return abs(x - y) < eps; } 1) Is there interest for this?
I'm not math expert, but I certainly agree that such utility would be useful. However, as some standard algorithms need equality comparison, while containers need less-than comparison, it would be nice to provide at least then both, and make names obvious enough.
2) What are the comparison algorithms to include?
good question. But if comparision algorithm is a template parameter (policy class), we do not have to define them all in advance. B.

Bronek Kozicki ha scritto:
Alberto Ganesh Barbati wrote:
template <typename T> struct compare_absolute_error : std::binary_function<T, T, bool>
bool operator()(T x, T y) const { return abs(x - y) < eps; } 1) Is there interest for this?
I'm not math expert, but I certainly agree that such utility would be useful. However, as some standard algorithms need equality comparison, while containers need less-than comparison, it would be nice to provide at least then both, and make names obvious enough.
I believe less-than comparison are not very useful, in fact. That's because whatever fuzzy algorithm you use, the induced relation won't be transitive, so it won't induce a strict-weak ordering. It's true that also fuzzy equality does not induce an equivalence relation for the same reason, yet the fuzzy equality is still useful even in absence of that requirement.
2) What are the comparison algorithms to include?
good question. But if comparision algorithm is a template parameter (policy class), we do not have to define them all in advance.
Well... if you already had the comparison algorithm to put in a template parameter, then you won't need a... comparison algorithm! The point here is exactly to provide a library of ready-made algorithms. Of course some details might be fine-tunable via policy classes, as suggested by John Maddock, but not the entire algorithm itself. Ganesh

"Alberto Ganesh Barbati" wrote
bool operator()(T x, T y) const { return abs(x - y) < eps; }
Maybe should be: bool operator()(T x, T y) const { return abs(x - y) < abs(eps); } Because I have been caught out when eps was negative ;-) Negaitive eps is easy to do if it is some function of x y or both, which is common e.g percentage etc. regards Andy little

Andy Little ha scritto:
"Alberto Ganesh Barbati" wrote
bool operator()(T x, T y) const { return abs(x - y) < eps; }
Maybe should be:
bool operator()(T x, T y) const { return abs(x - y) < abs(eps); }
Because I have been caught out when eps was negative ;-) Negaitive eps is easy to do if it is some function of x y or both, which is common e.g percentage etc.
Thanks for pointing that out. I would prefer getting the absolute value of eps once and for all in the constructor, in order to avoid repeated calls to abs() if the functor is invoked several times. eps could be made private to improve data hiding. However, I am inclined to require eps to be positive as a precondition, leaving to the programmer the responsibility to take the abs() if he deems it necessary. I bet most of the times eps will be positive if not even a compile time constant, so this design would fit in the "don't pay for what you don't use" paradigm. Ganesh

"Alberto Ganesh Barbati" <abarbati@iaanus.com> wrote in message news:e2h7lb$mtt$1@sea.gmane.org...
Andy Little ha scritto:
"Alberto Ganesh Barbati" wrote
bool operator()(T x, T y) const { return abs(x - y) < eps; }
Maybe should be:
bool operator()(T x, T y) const { return abs(x - y) < abs(eps); }
Because I have been caught out when eps was negative ;-) Negaitive eps is easy to do if it is some function of x y or both, which is common e.g percentage etc.
Thanks for pointing that out. I would prefer getting the absolute value of eps once and for all in the constructor, in order to avoid repeated calls to abs() if the functor is invoked several times. eps could be made private to improve data hiding.
However, I am inclined to require eps to be positive as a precondition, leaving to the programmer the responsibility to take the abs() if he deems it necessary. I bet most of the times eps will be positive if not even a compile time constant, so this design would fit in the "don't pay for what you don't use" paradigm.
Many times the "float error less than epsilon" test is used to terminate a successive approximation routine, which will be an infinite loop if eps is negative. Does a negative eps trigger an exception? regards Andy little

Andy Little ha scritto:
Many times the "float error less than epsilon" test is used to terminate a successive approximation routine, which will be an infinite loop if eps is negative. Does a negative eps trigger an exception?
I haven't decided yet. That's where your feedback is helpful to me. Possible options include (but are not limited to) triggering an exception or using a debug assert(). Ganesh

Alberto Ganesh Barbati <abarbati@iaanus.com> writes:
Hi Everybody,
in a discussion on comp.std.c++ it was proposed to include into boost some facility to "fuzzy" or, in other words, to "properly" compare floating point numbers. Something like this, for example:
template <typename T> struct compare_absolute_error : std::binary_function<T, T, bool> { T eps; compare_absolute_error(T eps_) : eps(eps_) {} bool operator()(T x, T y) const { return abs(x - y) < eps; } };
This kind of operators occurs frequently in algorithms and are sometimes error-prone to write, so having a ready-made and well tested component can certainly be an advantage.
There's something like this in the Boost.Test library. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams ha scritto:
Alberto Ganesh Barbati <abarbati@iaanus.com> writes:
Hi Everybody,
in a discussion on comp.std.c++ it was proposed to include into boost some facility to "fuzzy" or, in other words, to "properly" compare floating point numbers. Something like this, for example:
template <typename T> struct compare_absolute_error : std::binary_function<T, T, bool> { T eps; compare_absolute_error(T eps_) : eps(eps_) {} bool operator()(T x, T y) const { return abs(x - y) < eps; } };
This kind of operators occurs frequently in algorithms and are sometimes error-prone to write, so having a ready-made and well tested component can certainly be an advantage.
There's something like this in the Boost.Test library.
Yes. That's exactly what I was thinking about. Thanks for pointing that out. Only I think such facility can be useful also in use cases that are not related to "test", therefore my proposal is to provide more ways of comparison and put them all in boost.math. Ganesh

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams | Sent: 22 April 2006 19:03 | To: boost@lists.boost.org | Subject: Re: [boost] [math] RFC: comparison function objects | | Alberto Ganesh Barbati <abarbati@iaanus.com> writes: | | > Hi Everybody, | > | > in a discussion on comp.std.c++ it was proposed to include | into boost | > some facility to "fuzzy" or, in other words, to "properly" compare | > floating point numbers. Something like this, for example: | > | > template <typename T> | > struct compare_absolute_error : std::binary_function<T, T, bool> | > { | > T eps; | > compare_absolute_error(T eps_) : eps(eps_) {} | > bool operator()(T x, T y) const { return abs(x - y) < eps; } | > }; | > | > This kind of operators occurs frequently in algorithms and | are sometimes | > error-prone to write, so having a ready-made and well | tested component | > can certainly be an advantage. | | There's something like this in the Boost.Test library. Indeed there is almost exactly that, although obfuscated by its context, and by dealing with so-called weak and strong comparisons - 'tight and not so tight' - a distinction whose practical use I am less convinced about. See fp_comparisons.hpp John Maddock is putting these is a more useable form. There are several essential reading references from Knuth onwards in the Boost.Test documentation, and there should be more detail in the 1.34 release. There are uses for both absolute and relative (fraction of percent), and for knowing the number of bits (eps) different. IMO packaging as described, or similar, would be useful. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

"Alberto Ganesh Barbati" <abarbati@iaanus.com> wrote> template <typename T>
struct compare_absolute_error : std::binary_function<T, T, bool> { T eps; compare_absolute_error(T eps_) : eps(eps_) {} bool operator()(T x, T y) const { return abs(x - y) < eps; } };
BTW An alternate signature for the function: template <typename T> struct compare_absolute_error /*not derived from binary_function!*/ { T eps; compare_absolute_error(T eps_) : eps(eps_) {} //Signature as thus to prevent too many conversions template <typename T1, typename T2> bool operator()(T1 const & x, T2 const & y) const { return abs(x - y) < eps; } }; // ............. possible useage : compare_absolute_error<int> comp(1); int pixel_x =100; float current_pos_x = 100.5; bool grabbed = comp(current_pos_x, pixel_x); regards Andy Little

Andy Little ha scritto:
BTW An alternate signature for the function:
template <typename T> struct compare_absolute_error /*not derived from binary_function!*/ { T eps; compare_absolute_error(T eps_) : eps(eps_) {}
//Signature as thus to prevent too many conversions template <typename T1, typename T2> bool operator()(T1 const & x, T2 const & y) const { return abs(x - y) < eps; } };
// ............. possible useage :
compare_absolute_error<int> comp(1);
int pixel_x =100; float current_pos_x = 100.5;
bool grabbed = comp(current_pos_x, pixel_x);
The idea is nice, but in this particular case I think the "fixed" version is more suitable. First: the user can be sure about what approximation is going to be used for the comparison (in the realm of floating point, it's a big help). Second: knowing the type in advance allows you to deal correctly with corner cases (infinities, denorm, etc.). With the member-template approach that you suggest you would have to handle more cases and it might easily get tedious. Ganesh

"Alberto Ganesh Barbati" wrote
Andy Little ha scritto:
//Signature as thus to prevent too many conversions template <typename T1, typename T2> bool operator()(T1 const & x, T2 const & y) const { return abs(x - y) < eps; } };
// ............. possible useage :
compare_absolute_error<int> comp(1);
int pixel_x =100; float current_pos_x = 100.5;
bool grabbed = comp(current_pos_x, pixel_x);
The idea is nice, but in this particular case I think the "fixed" version is more suitable. First: the user can be sure about what approximation is going to be used for the comparison (in the realm of floating point, it's a big help). Second: knowing the type in advance allows you to deal correctly with corner cases (infinities, denorm, etc.). With the member-template approach that you suggest you would have to handle more cases and it might easily get tedious.
There is still only one case. The type is the result_type of T1() - T2 - T() which is evaluated at compile-time. This is easy to find using Boost.Typeof as shown below. The result using the two template parameters is superior because you actually have much more control over conversions. Because the types arent clamped you know that T1 and T2 are the exact argument types, rather than the result of a conversion to T. No narrowing conversions take place though they can with the original. The result value of the two functions is different too as shown below, but I think that the result of the second is the less surprising: You might argue that only floats should be allowed for T but I would have to ask then what is wrong with using an int as shown? #include <cmath> #include <limits> #include <boost/typeof/typeof.hpp> // Hopefully in next boost distro #include <iostream> template <typename T> struct compare_absolute_error1{ T eps; compare_absolute_error1(T eps_) : eps(eps_) {} bool operator() (T x, T y) const { return abs(x - y) < eps; } }; template <typename T> struct compare_absolute_error2{ T eps; compare_absolute_error2(T eps_) : eps(eps_) {} //result_type of the comparison template <typename T1, typename T2> struct result{ typedef BOOST_TYPEOF_TPL( T1() - T2() - T()) type; }; // Signature as thus to prevent too many conversions template <typename T1, typename T2> bool operator()(T1 const & x, T2 const & y) const { // Demo that the result type is available typedef typename result<T1,T2>::type result_type; // whatever... result_type infinity = std::numeric_limits<result_type>::infinity(); return abs(x - y) < eps; } }; int main() { int pixel_x = 100; float current_pos_x = 99.4F; //------------- // note: This version gives // warning 'argument' : conversion from 'float' to 'int', possible loss of data compare_absolute_error1<int> comp1(1); bool grabbed1 = comp1(current_pos_x, pixel_x); //---------------- // this version has no warnings compare_absolute_error2<int> comp2(1); bool grabbed2 = comp2(current_pos_x, pixel_x); std::cout << "compare_absolute_error1 gives " << grabbed1 <<'\n'; std::cout << "compare_absolute_error2 gives " << grabbed2 <<'\n'; } regards Andy Little
participants (6)
-
Alberto Ganesh Barbati
-
Andy Little
-
Bronek Kozicki
-
David Abrahams
-
John Maddock
-
Paul A Bristow