
On Tue, Feb 26, 2008 at 3:04 PM, John Maddock <john@johnmaddock.co.uk> wrote:
John Maddock wrote:
1) Floating point classification routines: these are optimised implementations of the C99 and C++ TR1 functions fpclassify, isinf, isnan, isnormal and isfinite. From Boost-1.35 onwards these are already a part of Boost.Math (see http://svn.boost.org/svn/boost/trunk/libs/math/doc/sf_and_dist/html/math_too...) so if accepted the two implementations will get merged.
The review here should focus on the implementation used, and testing on whatever platforms you have available - in particular are there any circumstances (compiler optimisation settings etc) where this implementation breaks?
These look fine to me, and as Johan mentioned, any breakages should be fixed in endian.hpp.
Sorry to bring this up again (!), but I've been trying to get my head around the aliasing rules and what is and isn't allowed, in particular by following the rather useful article here: http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_alias...
also see: http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
From this it appears that your current code is perfectly fine, however I still can't help feeling that we could do better than call an external function (memcpy) to move 4 or 8 bytes. The suggestion in the article is to use a union to move from one type to another:
union float_to_int32 { float f; boost::uint32_t i; };
boost::uint32_t get_bits(float f) { float_to_int32 u; u.f = f; return u.i; }
I can't claim to have tested exhaustively, but modifying get_bits and set_bits to use this method seems to work OK for VC8&9 and Minwg GCC.
GCC 4.2 under x86_64 produces better code with std::memcpy (which is treated as an intrinsic) than with the union trick (compiled with -O3): uint32_t get_bits(float f) { float_to_int32 u; u.f = f; return u.i; } Generates: _Z8get_bitsf: movss %xmm0, -4(%rsp) movl -4(%rsp), %edx movl %edx, %eax ret Which has an useless "movl %edx, %eax". I think that using the union confuses the optimizer. This instead: uint32_t get_bits2(float f) { uint32_t ret; std::memcpy(&ret, &f, sizeof(f)); return ret; } Generates: _Z9get_bits3f: movss %xmm0, -4(%rsp) movl -4(%rsp), %eax ret Which should be optimal (IIRC you can't move from an xmms register to an integer register without passing through memory). Note that the (illegal) code: uint32_t get_bits3(float f) { uint32_t ret = *reinterpret_cast<uint32_t*>(&f); return ret; } Generates the same code as get_bits2 if compiled with -fno-strict-aliasing. Without that flag it miscompiles (rightly) the code. I've tested the code under plain x86 and there is no difference between all 3 functions. So the standard compliant code is also optimal, at least with recent GCCs. -- gpd