
On Tue, Feb 26, 2008 at 4:59 PM, Sebastian Redl <sebastian.redl@getdesigned.at> wrote:
Giovanni Piero Deretta wrote:
_Z9get_bits3f: movss %xmm0, -4(%rsp) movl -4(%rsp), %eax ret
Which should be optimal (IIRC you can't move from an xmms register to an integer register without passing through memory).
SSE2: movd %xmm0, %eax
Right, it was the old x87 register stack that doesn't support fp-register/general-purpose-register moves. Anyways, after reading some documentation, it seems that the generated assembly is probably still optimal for K8 and pentium4 which have high latency and very high latency (respectively) xmm register/register moves. My gcc version isn't specifically capable of optimizing for core2, which shouldn't have this limitation. -- gpd