
Giovanni Piero Deretta wrote:
also see: http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
Thanks, will take a look.
GCC 4.2 under x86_64 produces better code with std::memcpy (which is treated as an intrinsic) than with the union trick (compiled with -O3):
uint32_t get_bits(float f) { float_to_int32 u; u.f = f; return u.i; }
Generates:
_Z8get_bitsf: movss %xmm0, -4(%rsp) movl -4(%rsp), %edx movl %edx, %eax ret
Which has an useless "movl %edx, %eax". I think that using the union confuses the optimizer. This instead:
uint32_t get_bits2(float f) { uint32_t ret; std::memcpy(&ret, &f, sizeof(f)); return ret; }
Generates:
_Z9get_bits3f: movss %xmm0, -4(%rsp) movl -4(%rsp), %eax ret
Which should be optimal (IIRC you can't move from an xmms register to an integer register without passing through memory).
Note that the (illegal) code:
uint32_t get_bits3(float f) { uint32_t ret = *reinterpret_cast<uint32_t*>(&f); return ret; }
Generates the same code as get_bits2 if compiled with -fno-strict-aliasing. Without that flag it miscompiles (rightly) the code. I've tested the code under plain x86 and there is no difference between all 3 functions.
So the standard compliant code is also optimal, at least with recent GCCs.
Very interesting! In that case I think we should document what Johan has now, to avoid this comming up again in the future :-) Thanks! John.