
Stefan Seefeld a écrit :
Joel Falcou wrote:
Michael Fawcett a écrit :
Joel, how does the extension detection mechanism work? Is there as mall runtime penalty for each function as it detects which path would be optimal, or can you define at compile-time what extensions are available (e.g. if you are compiling for a fixed hardware platform, like a console). I have a #ifdef/#elif structure that detects which extension have been set up ont he compiler and I match this with a platform detection to know where to jump and how to overload some functions or class definition.
I tried the runtime way and it was fugly slow. So I'm back to a compile-time detection as performance was critical.
Actually, I would expect this to be a mix of runtime and compile-time decision. While there are certainly things that can be decided at compile-time (architecture, available extensions, data types), there are also parameter that are only available at runtime, such as alignment, problem size, etc. Well, again, the grain here is the data pack, aka generalized SIMD vector.
In Sourcery VSIPL++ (http://www.codesourcery.com/vsiplplusplus/) we use a dispatch mechanism that allows programmers to chain extension 'evaluators' in a type-list, and this type-list is then walked over once by the compiler to eliminate unavailable matches, and the resulting list at runtime to find a match based on the above runtime parameters. This is also where we parametrize for what sizes we want to dispatch to a given backend (for example if the performance gain outmatches the data I/O penalty, etc.).
Obviously, all this wouldn't make sense on a very fine-grained level. But for typical blas-level or signal-processing operations (matrix multiply, FFT, etc.) this works like a charm.
(We target all sorts of hardware, from clusters over Cell processors down to GPUs.) That's what we do in NT2, mixed CT/RT selectors
-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35