
Hi, Find attached a small test program that tries to highlight some of the "usual" floating point traps. The essence is the following small function that outputs the results of some "buggy" floating point logic. The rest is just setting up the environment so that the bugs actually get triggered. void buggy_function(double eps = 1e-13, int length = 3, double delta = 1 / 7.0) { using namespace std; double alpha = 1e-13; double beta = eps; double a = length * delta; double b = length * delta; vector<double> inverse; for (int i = 0; i < length; ++i) inverse.push_back(1.0/(2*i+1)); vector<double> sorted; for (int i = 0; i < length; ++i) sorted.push_back(inverse[i]*delta); double t = -delta*sorted.size(); double tt = -sorted.size()*delta; bool t_tt = (abs(t-tt) < eps); bool abs_t = (abs(t) == -t); double c = (1-alpha) * a + alpha * b; double d = (1-beta) * a + beta * b; bool a_c = (a == c); bool b_d = (b == d); vector<bool> equal; for (int i = 0; i < length; ++i) equal.push_back(inverse[i]*delta == sorted[i]); std::cout << "t_tt = " << t_tt << "\nabs_t = " << abs_t << "\na_c = " << a_c << "\nb_d = " << b_d << "\n"; std::cout << "equal = ["; for (int i = 0; i < length; ++i) std::cout << " " << equal[i]; std::cout << " ]\n"; } The hypothetical "naive" programmer might expect the output: t_tt = 1 abs_t = 1 a_c = 1 b_d = 1 equal = [ 1 1 1 ] However, the actual output is not only different from this, but also depends on the compiler, the compiler settings, the current configuration of the floating points unit and the actual headers included directly or indirectly by the compilation unit. g++ test_fp.cpp -g -Wall compiles without warning and executes with output: t_tt = 0 abs_t = 1 a_c = 1 b_d = 1 equal = [ 1 1 1 ] g++ test_fp.cpp -DNDEBUG -O2 -msse2 -Wall compiles without warning and executes with output: t_tt = 0 abs_t = 1 a_c = 1 b_d = 1 equal = [ 1 0 0 ] I created a new project with MSVC9. The debug version compiles with warnings for the things that go badly wrong and executes with output: t_tt = 0 abs_t = 0 a_c = 1 b_d = 1 equal = [ 1 1 1 ] set floating point unit to 'high' precision: t_tt = 0 abs_t = 0 a_c = 1 b_d = 1 equal = [ 1 1 1 ] The release version compiles with warnings for the things that go badly wrong and executes with output: t_tt = 0 abs_t = 0 a_c = 1 b_d = 1 equal = [ 1 1 1 ] set floating point unit to 'high' precision: t_tt = 0 abs_t = 0 a_c = 0 b_d = 1 equal = [ 1 0 0 ] In my case (see the background story below), it was actually the trap represented by the output "abs_t = 0" instead of "abs_t = 1" that caught me on linux64 but worked on win32. (It turned out difficult to reproduce this with gcc, so the test program reproduces this trap with MSVC, even so MSVC warns quite explicitly about it). I think it is a funny exercise to debug the test program and find out what actually goes wrong (which is especially funny since most traps only go wrong with compiler optimization turned on). Regards, Thomas ------------------------ The background story: The recent reviews of [Constrained Value], [Polygon] and [GGL] all featured interesting disagreements over floating point issues: http://lists.boost.org/Archives/boost/2008/12/145655.php http://lists.boost.org/Archives/boost/2008/12/145788.php http://lists.boost.org/Archives/boost/2009/03/149567.php http://lists.boost.org/Archives/boost/2009/08/155767.php http://lists.boost.org/Archives/boost/2009/11/158560.php (These were long threads with many messages and many participants, so please don't misinterpret my choice of unrepresentative messages as any form of statement.) Last Wednesday, I had to give an internal live demo of a feature I just finished. I had done the development mostly on win32, but I wanted to run the demo on linux64, to be able to show examples of realistic size (full chip...). While preparing the presentation, I was surprised to find the accuracy of the results significantly worse than I remembered from my previous tests. I repeated my previous tests on linux64, and found out that there was indeed a problem with my code on linux64. I did the demo on win32, but was asked after the presentation to send the source code to somebody from another group. So I spent most of the night trying to find the problem in my code. When I found out that some code with floating point driven logic (I had written years ago) to analytically integrate radial symmetric functions over rectangular domains was showing unexpected behavior, I decided to postpone the floating point fights with the compiler to the next day. I told this story the next day in a group meeting, and was confronted with much floating point FUD and unrealistic suggestions, basically boiling down to either not writing floating point driven logic in the first place, or commenting it in excessive detail. So I lost my temper and explained that it was difficult enough for me to write this code with floating point driven logic, and that anybody suggesting writing it without floating point driven logic should do so please, because I'm unable to do it. I have to admit that I read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" only superficially: http://www.engrng.pitt.edu/hunsaker/3097/floatingpoint.pdf http://docs.sun.com/source/806-3568/ncg_goldberg.html I also have to admit that I write code with floating point driven logic from time to time in case it seems unavoidable to me, and that this code often has bugs that are only discovered months after the code is already finished. But I never found a bug in such code that would have been a fundamental problem with floating point, but always just the "usual" floating point traps I keep overlooking by mistake. Now somebody will probably suggest that I should at least document the "usual" floating point traps then. So I created the attached small test program showing the "usual" floating point traps I could still remember having overlooked at least once. It was actually more difficult than expected to arrange the code in a way that the bugs are really triggered for typically compiler settings of MSVC and gcc.

On Sat, Jan 23, 2010 at 10:25 AM, Thomas Klimpel <Thomas.Klimpel@synopsys.com> wrote:
I have to admit that I read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" only superficially: http://www.engrng.pitt.edu/hunsaker/3097/floatingpoint.pdf http://docs.sun.com/source/806-3568/ncg_goldberg.html
I also have to admit that I write code with floating point driven logic from time to time in case it seems unavoidable to me, and that this code often has bugs that are only discovered months after the code is already finished. But I never found a bug in such code that would have been a fundamental problem with floating point, but always just the "usual" floating point traps I keep overlooking by mistake.
I remember working with an app which used doubles for storing monetary amounts. The rounding errors associated with the precision of doubles didn't show up until we started aggregating together enough money, and accounting for the discrepancies which occurred was not a fun problem to solve. When I try to break in someone who is new to this problem, it's usually with the simple exercise: double dime = 0.1, ten_dimes=1.0, pocket = 0.0; for (int i=0; i<10; i++) pocket += dime; assert (pocket == ten_dimes); // fails. And then follow that up with a similar exrcise involving 0.3333 (base 10) to help them understand why the prior example doesn't work. So far, when there is a level of guaranteed precision which is needed (such as with money) we always end up working with integer types.

Thomas Klimpel wrote:
Hi,
Find attached a small test program that tries to highlight some of the "usual" floating point traps. The essence is the following small function that outputs the results of some "buggy" floating point logic. The rest is just setting up the environment so that the bugs actually get triggered.
Hello Thomas, I'm unable to reproduce, see below. Looking at your code, you might want to give std::fabs() a try over abs(). I remember having been bitten hard by that one-letter difference. Cheers, Rutger $ g++ test_fp.cpp -DNDEBUG -O2 -msse2 -Wall $ ./a.out t_tt = 0 abs_t = 1 a_c = 1 b_d = 1 equal = [ 1 1 1 ]

Rutger ter Borg wrote:
I'm unable to reproduce, see below.
Oh no, I should have typed $ g++ test_fp.cpp -DNDEBUG -O2 -msse2 -mfpmath=sse -Wall to force the compiler on my 32-bit system to really use sse floating point instructions instead of the 387 instructions (I guess the default is -mfpmath=387 for 32-bit intel systems and -mfpmath=sse for 64-bit intel systems). I was already wondering why the bugs were so easy to trigger, because I remembered that some of the bugs in the test program were only rarely triggered when occurring in real life code. I guess it's at least 2048 times easier to trigger such bugs with 387 instructions than with "normal" floating point instructions (like sse). It might well be that sse floating point instructions don't allow to trigger some of the bugs at all. So on a 64-bit intel system, typing $ g++ test_fp.cpp -DNDEBUG -O2 -mfpmath=387 -Wall should trigger the bugs. I get different output when I use -O3 instead of -O2, but -O2 seems to trigger more bugs.
Looking at your code, you might want to give std::fabs() a try over abs(). I remember having been bitten hard by that one-letter difference.
You spotted one of the bugs, very good. This bug is actually one of my favorites. I tend to fix this bug by writing "std::abs". Regards, Thomas

Thomas Klimpel wrote:
You spotted one of the bugs, very good. This bug is actually one of my favorites. I tend to fix this bug by writing "std::abs".
I guess your case has to do with 80 bits precision in the registers vs. 64 bits precision in memory. Please try the -ffloat-store option. In addition, with the t/tt test, it states -double*integer vs -integer*double where -integer gets converted to double, and in the other case integer gets converted to double, which, as you know, may give different results. If you change to double t = -(delta*sorted.size()); double tt = -(sorted.size()*delta); (and add the -ffloat-store), then you may get the "naive" result. Cheers, Rutger

Hi, Nice examples. On Sat, Jan 23, 2010 at 04:25:56PM +0100, Thomas Klimpel wrote:
When I found out that some code with floating point driven logic (I had written years ago) to analytically integrate radial symmetric functions over rectangular domains was showing unexpected behavior, [...]
[...]
I also have to admit that I write code with floating point driven logic from time to time in case it seems unavoidable to me, and that this code often has bugs that are only discovered months after the code is already finished.
... or years in the present case you described above.
But I never found a bug in such code that would have been a fundamental problem with floating point, but always just the "usual" floating point traps I keep overlooking by mistake.
I'm curious: what distinction are you drawing between "fundamental problems" and "usual traps that are continually overlooked"? And what is the moral of your story? My reading is that you're essentially saying that floating point-driven logic is fine as long as you take care to avoid the usual traps. But: isn't the very fact that most of us stumble over them continually -- and they can lie undiscovered for years -- reason enough to look for something better? Cheers, -Steve

Steve M. Robbins wrote:
Nice examples.
Thank you. They were indeed the main point of my message, the rest was just the background story.
I'm curious: what distinction are you drawing between "fundamental problems" and "usual traps that are continually overlooked"?
What will happen when I "send the source code to somebody from another group"? Parts of the code could potentially be pasted into other source files, and these source files might include "different" headers and be compiled with "different" compiler flags and compilers than the initial code was tested with. In this context, "fundamental problems" would mean that it would be impossible to fix the discovered floating point bugs in a way that would work on any reasonably standard conforming compiler. So when I say "usual traps that are continually overlooked", I mean that my floating point code has real bugs that are easy to fix once discovered. The opposite would be unfixable interactions between the code and the compiler (Corresponding FUD: write once, debug everywhere).
And what is the moral of your story? My reading is that you're essentially saying that floating point-driven logic is fine as long as you take care to avoid the usual traps.
I really try to avoid floating point-driven logic as far as I can. So when I'm forced to use floating point-driven logic, the problem at hand is already seriously challenging me. These are typically 2D geometrical problems, like sweepline type algorithms, or computing the overlapping area of three circles or the mentioned integration problem, which require handling of many different geometric scenarios. What are your alternatives in such a situation? The authors of [Constrained Value] and [Polygon] decided to avoid floating point, and got serious flak for that decision during their reviews. The authors of [GGL] took the other decision, and got the typical floating point FUD during his review. My moral from this is that you lose either way. But as I wrote, I was offered a third alternative: "..., or commenting it in excessive detail." But how can I comment in the source code that I managed to avoid the "usual" traps? - A first step would definitively be to document "usual" traps themselves. The first step for this is a test program that allows to trigger them. - A second step might be the complete opposite, i.e. a test program that shows on which floating point properties one can rely on when writing floating point-driven logic. - A third step might be a test program showing the potential accuracy loss when a compiler does misguided optimizations, like ruining Kahan Summation Formula. But I'm not sure whether such "broken" compilers are still around, at least I guess it would be difficult for me to get my hands on such a "broken" compiler.
But: isn't the very fact that most of us stumble over them continually -- and they can lie undiscovered for years -- reason enough to look for something better?
Good question. I think Java took that approach. But they also got flak, claiming that their overly restrictive floating point specification would be ruining performance. But on the positive side, most of the "usual" traps my test program highlighted don't exist in Java. The only one I'm not sure about is ((1-alpha)*a + alpha*b) != a for a==b. Regards, Thomas

Thomas Klimpel wrote:
Good question. I think Java took that approach. But they also got flak, claiming that their overly restrictive floating point specification would be ruining performance. But on the positive side, most of the "usual" traps my test program highlighted don't exist in Java. The only one I'm not sure about is ((1-alpha)*a + alpha*b) != a for a==b.
Wouldn't requiring IEEE 754 FP arithmetic for a (known to be correct) algorithm be enough? On the other side, in my experience, it's not just the FP logic that causes trouble, but also that FP precision usually isn't put in a (larger) algorithm by design. FP algorithms are all about accumulation of error, aren't they? Would be nice to have some kind of debugging facility (integrated with the code), that somehow keeps track of all kinds of error bounds/measures you're interested in. Cheers, Rutger

FP algorithms are all about accumulation of error, aren't they? Would be nice to have some kind of debugging facility (integrated with the code), that somehow keeps track of all kinds of error bounds/measures you're interested in.
I've experimented with that - a "dual precision" type - basically a class that evaluates every expression it's subject to at two different precisions - say double and with an arbitrary precision type, and then optionally prints out the accumulated error. The idea is one can then step through code, or else set a break when the accumulated error reaches some threshold, and see where the error is coming from in the algorithm. That's the theory anyway, in practice I've found good old fashioned pen and paper analysis of the algorithm is often as good... John.

This by the way was the very motivation behind Boost.interval library. And it's used to guarantee consistent floating-point driven logic (with overlapping intervals triggering exact computation) with the cost (in most cases) of floating point additions (well, a bit more, for the interval bounds, and a bit more yet for changing the rounding mode). -- Hervé Brönnimann hervebronnimann@mac.com On Jan 24, 2010, at 11:17 AM, John Maddock wrote:
FP algorithms are all about accumulation of error, aren't they? Would be nice to have some kind of debugging facility (integrated with the code), that somehow keeps track of all kinds of error bounds/measures you're interested in.
I've experimented with that - a "dual precision" type - basically a class that evaluates every expression it's subject to at two different precisions - say double and with an arbitrary precision type, and then optionally prints out the accumulated error. The idea is one can then step through code, or else set a break when the accumulated error reaches some threshold, and see where the error is coming from in the algorithm. That's the theory anyway, in practice I've found good old fashioned pen and paper analysis of the algorithm is often as good...
John. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

This by the way was the very motivation behind Boost.interval library. And it's used to guarantee consistent floating-point driven logic (with overlapping intervals triggering exact computation) with the cost (in most cases) of floating point additions (well, a bit more, for the interval bounds, and a bit more yet for changing the rounding mode).
Absolutely, and I would love to get Boost.Interval and Boost.Math to work well together, I put quite a bit of effort into making that happen actually, but my patches (and not to mention bug reports) for Boost.Interval went unattended to, so I basically gave up... John.

This by the way was the very motivation behind Boost.interval library. And it's used to guarantee consistent floating-point driven logic (with overlapping intervals triggering exact computation) with the cost (in most cases) of floating point additions (well, a bit more, for the interval bounds, and a bit more yet for changing the rounding mode).
Absolutely, and I would love to get Boost.Interval and Boost.Math to work well together, I put quite a bit of effort into making that happen actually, but my patches (and not to mention bug reports) for Boost.Interval went unattended to, so I basically gave up...
I forgot to mention - it's not so easy to set a breakpoint in Boost.Interval so that you break (or only print debugging info to std::cerr) once the epsilon error reaches a certain threshold. I'm probably missing something though... John.

Thomas Klimpel wrote:
Find attached a small test program that tries to highlight some of the "usual" floating point traps.
Is it time to raise the idea of a Boost fixed-point library again? It's not a solution for all, or even most, floating-point problems, but there are cases where floats are used only because they are more convenient (i.e. better supported by the language) than fixed-point. Phil.

Phil Endecott wrote:
Is it time to raise the idea of a Boost fixed-point library again?
A long time ago, I read https://ccrma.stanford.edu/~bilbao/master/goodcopy.html The developed methods were also analyzed in a fixed-point context. But trying to think about fixed-point numbers, I soon realized that I don't even understand the basics, like negative numbers or multiplication. I think I did a web search at that time, but wasn't able to find much introductory material. However, I just did a web search again, and there was an extensive Wikipedia article on the subject, with many interesting references. I found the following in the first reference:
While there is nothing particularly difficult about this subject, I found little documentation either in hardcopy or on the web. What documentation I did find was disjointed, never putting together all of the aspects of fixed-point arithmetic that I think are important. I therefore decided to develop this material and to place it on the web not only for my own reference but for the benefit of others who, like myself, find themselves needing a complete understanding of the issues in implementing fixed-point algorithms on platforms utilizing integer arithmetic.
So perhaps it was really difficult to find introductory material on the web a long time ago, but the situation seems to have changed in the meantime. (Perhaps I will read some of the introductory material when I find time.) I also read the following on Wikipedia:
ISO/IEC TR 18037[5] specifies fixed-point data types for the C programming language; vendors are expected to implement the language extensions for fixed point arithmetic in coming years. Fixed-point support is implemented in GCC.[6][7]
This looks like fixed-point support will really become "mainstream" soon. How is this related to C++ and the idea of a Boost fixed-point library? Regards, Thomas

This looks like fixed-point support will really become "mainstream" soon. How is this related to C++ and the idea of a Boost fixed-point library?
John Maddock wrote:
FP algorithms are all about accumulation of error, aren't they? Would be nice to have some kind of debugging facility (integrated with the code), that somehow keeps track of all kinds of error bounds/measures you're interested in.
I've experimented with that - a "dual precision" type - basically a class that evaluates every expression it's subject to at two different precisions - say double and with an arbitrary precision type, and then optionally prints out the accumulated error. The idea is one can then step through code, or else set a break when the accumulated error reaches some threshold, and see where the error is coming from in the algorithm. That's the theory anyway, in practice I've found good old fashioned pen and paper analysis of the algorithm is often as good...
I encountered the TTMath library recently, http://www.ttmath.org/ttmath. Is it already discussed within the context of Boost? It is a templated library, head-only where mantissa and exponent of a big-number-library can be set as template parameters, it is almost like a Boost library... I used TTmath figuring out a problem within segment intersection, all numeric types gave me different results, including Excel and spatial databases, and I didn't have any clue of what the real real result should look like (even GMP and CLN differed...). But this library helped me. Regards, Barend

On Sat, 23 Jan 2010, Phil Endecott wrote:
Thomas Klimpel wrote:
Find attached a small test program that tries to highlight some of the "usual" floating point traps.
Is it time to raise the idea of a Boost fixed-point library again?
It's not a solution for all, or even most, floating-point problems, but there are cases where floats are used only because they are more convenient (i.e. better supported by the language) than fixed-point.
For cases where base-10 is desired, see http://speleotrove.com/decimal/ Many design decisions may also be relevant for a fixed-point library. - Daniel

Thomas Klimpel wrote:
But I never found a bug in such code that would have been a fundamental problem with floating point, but always just the "usual" floating point traps I keep overlooking by mistake.
Now somebody will probably suggest that I should at least document the "usual" floating point traps then. So I created the attached small test program showing the "usual" floating point traps I could still remember having overlooked at least once.
It seems I wasn't explicit enough: The function "buggy_function" is intended to show the 'different' types of "usual" floating point traps that 'hit me' at least once. Some of them are not subtle floating point behavior, but real bugs (with simple obvious fixes) caused by the semantics of C++. However, it's true that they are often difficult to trigger and difficult to spot. So I guess I should have at least included the compiler warnings from MSVC9, as they contain helpful indications of what goes actually wrong:
I created a new project with MSVC9. The debug version compiles with warnings for the things that go badly wrong and executes with output:
.\test_fp.cpp(25) : warning C4146: unary minus operator applied to unsigned type, result still unsigned .\test_fp.cpp(26) : warning C4244: 'argument' : conversion from 'double' to 'int', possible loss of data .\test_fp.cpp(27) : warning C4244: 'argument' : conversion from 'double' to 'int', possible loss of data Here are the corresponding lines of the function "buggy_function": 24: double t = -delta*sorted.size(); 25: double tt = -sorted.size()*delta; 26: bool t_tt = (abs(t-tt) < eps); 27: bool abs_t = (abs(t) == -t); Regards, Thomas
participants (9)
-
Barend Gehrels
-
Bob Walters
-
dherring@ll.mit.edu
-
Hervé Brönnimann
-
John Maddock
-
Phil Endecott
-
Rutger ter Borg
-
Steve M. Robbins
-
Thomas Klimpel