
Joel Falcou wrote:
Michael Fawcett a écrit :
Joel, how does the extension detection mechanism work? Is there as mall runtime penalty for each function as it detects which path would be optimal, or can you define at compile-time what extensions are available (e.g. if you are compiling for a fixed hardware platform, like a console). I have a #ifdef/#elif structure that detects which extension have been set up ont he compiler and I match this with a platform detection to know where to jump and how to overload some functions or class definition.
I tried the runtime way and it was fugly slow. So I'm back to a compile-time detection as performance was critical.
Actually, I would expect this to be a mix of runtime and compile-time decision. While there are certainly things that can be decided at compile-time (architecture, available extensions, data types), there are also parameter that are only available at runtime, such as alignment, problem size, etc. In Sourcery VSIPL++ (http://www.codesourcery.com/vsiplplusplus/) we use a dispatch mechanism that allows programmers to chain extension 'evaluators' in a type-list, and this type-list is then walked over once by the compiler to eliminate unavailable matches, and the resulting list at runtime to find a match based on the above runtime parameters. This is also where we parametrize for what sizes we want to dispatch to a given backend (for example if the performance gain outmatches the data I/O penalty, etc.). Obviously, all this wouldn't make sense on a very fine-grained level. But for typical blas-level or signal-processing operations (matrix multiply, FFT, etc.) this works like a charm. (We target all sorts of hardware, from clusters over Cell processors down to GPUs.) Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...