
7 Aug
2008
7 Aug
'08
7:58 p.m.
When intel came out with it's first SIMD instruction set I was happy to try them on my application. It was a failure. Because even if one instruction executes on 3 data locations it's cost was 3 CPU cycles. Three integer instructions cost 1 cycle each. Also with SIMD you had to load the registers first. <snip>
And, Yes, optimizing at this level is a rare situation. I will be curious to know which kind of IP algorithm need those but this is maybe a topic
MMX and SSE were rather a disaster on this point. Acually, I learn to play with SIMD using Altivec on PowerPC and that's a complete different deal. SSSE3 and upcoming SSE4 are ratehr good too. that should go private instead of adding noise to the list.