
When you're talking "optimal," you're setting a pretty dang high bar.
I don't have to care about "optimal" if the difference between a suboptimal use of SIMD and not using it at all is an order of magnitude.
Indeed, however, in Gautam Sewani's GSOC project this year he looked quite hard at optimising Boost.Math with SSE2/3 instructions and found it quite hard to find *any* use cases where hand written SEE2 code was better than compiler generated code - one exception was the classic vectorised addition, but even then you're struggling to get a 2x improvement.
hm, i cannot really comment on non-vectorized functions. i am heavily using simd (sse) for vectorized operations. doing vectorized math operations, something like: void tanh4(float * out, const float * in); i measured a performance gain of a factor 6 to 7 compared to the libm implementation ... in general, when optimizing code for simd operations, it makes sense to focus on vector operations ... reading all this discussion about vectorizing compilers, one should always take into account, that compilers are not allowed to do some transformations, because of aliasing issues. writing a simdfied vector function, one can specify that pointers are required to be aligned and memory regions are not allowed to overlap, an assumption the compiler is not able to make. i would be curious to see a boost.simd library and could provide some of my sse code for vector operations ... best, tim -- tim@klingt.org http://tim.klingt.org Which is more musical, a truck passing by a factory or a truck passing by a music school? John Cage