Re: [boost] Back to Boost.SIMD - Some performances ...

26 Mar 2009

      Joel Falcou wrote:
...
Michael Fawcett a écrit :
...
Joel, how does the extension detection mechanism work? Is there as 
mall runtime penalty for each function as it detects which path would 
be optimal, or can you define at compile-time what extensions are 
available (e.g. if you are compiling for a fixed hardware platform, 
like a console).
I have a #ifdef/#elif structure that detects which extension have been 
set up ont he compiler and I match this with a platform detection to 
know where to jump and how to overload some functions or class 
definition.
I tried the runtime way and it was fugly slow. So I'm back to a 
compile-time detection as performance was critical.
Actually, I would expect this to be a mix of runtime and compile-time
decision. While there are certainly things that can be decided at
compile-time (architecture, available extensions, data types), there are
also parameter that are only available at runtime, such as alignment,
problem size, etc.

In Sourcery VSIPL++ (http://www.codesourcery.com/vsiplplusplus/) we use
a dispatch mechanism that allows programmers to chain extension
'evaluators' in a type-list, and this type-list is then walked over once
by the compiler to eliminate unavailable matches, and the resulting list
at runtime to find a match based on the above runtime parameters. This
is also where we parametrize for what sizes we want to dispatch to a
given backend (for example if the performance gain outmatches the data
I/O penalty, etc.).

Obviously, all this wouldn't make sense on a very fine-grained level.
But for typical blas-level or signal-processing operations (matrix
multiply, FFT, etc.) this works like a charm.

(We target all sorts of hardware, from clusters over Cell processors
down to GPUs.)

Regards,
       Stefan

-- 

       ...ich hab' noch einen Koffer in Berlin...

Re: [boost] Back to Boost.SIMD - Some performances ...

Stefan Seefeld