
12 Jun
2011
12 Jun
'11
9:42 a.m.
Hi David,
What's the difference between: ADDPD XMM0, XMM1 and XMM0 = __builtin_ia32_addpd (XMM0, XMM1) I would contend nothing, from a programming effort perpective.
If you compare how GCC handles this, you'll see that using any of asm inside a loop disables virtually any optimization (like loop unrolling). Even if you use automatic register allocation in the asm block. If you rewrite the same using builtins (almost 1-to-1), the optimization is back. Thanks, Maxim