
Kim Barrett wrote:
At 2:23 PM +0100 12/6/08, Robert Kawulak wrote:
... maybe the problem could be somehow solved if we have a function float exact(float) that, given a floating point value (that may have greater precision because of caching in a register), returns a value that is truncated (has exactly the precision of float, not greater).
I think that something along the lines of the following will likely work:
inline double exact(double x) { struct { volatile double x; } xx = { x }; return xx.x; }
The idea is to force the value to make a round trip through a memory location of the "correct" size. The use of volatile should prevent the compiler from optimizing away the trip through memory.
The following should work: inline double exact(double x) { double y; memcpy(&y, &x, sizeof(double)); return y; } But I'm not sure the library should do this at all. It seems like forcing a policy upon the user. And it may be inefficient. The root of the problem is that Intel processors may store double values in 80-bit register. These values may later be truncated to 64-bits. However, you can force doubles to always be stored with 64-bit precision by changing the processor floating point control word. On Visual Studio, this can be done through the command _controlfp(_PC_53, MCW_PC) (53 = number of significand (mantissa) bits in the 64-bit format). --Johan Råde