Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)

21 Jan 2009

      On Wednesday 21 January 2009 01:30, Joel Falcou wrote:
...
David A. Greene a écrit :
...
A library of fast routines for doing various things is quite different
from creating a whole DSEL to do SIMD code generation.
How a DSEL can be different from a library still puzzle me as the basic
definition of a DSEL is a DSL embedded into a host language as a library.
Implementing a DSEL to do code generation is a LOT more work than
simply coding a fast library in asm.  If you want to generate SIMD code
for lots of libraries than a DSEL might be worth it, but I'm talking about
specialized applications here (matrix multiply, etc.).
...
...
A library of fast matrix mutliply, etc. would indeed be useful.
You mean, useful like being said weeks in advance that it's useless
cause uBlas already do it as it was said earlier ? And if, as you said
compilers already do what it's needed, then I call this useless too
cause we'll just wait that all compiler do the same ...
I'm talking about specific routines tuned in a way that a general-purpose 
compiler would not be able to replicate.  It's a very small set of codes.
...
...
It strikes me that writing these routines using gcc intrinsics wouldn't
result in optimal code on all architectures.  Similarly, it seems that a
DSEL to do the same would have similar deficiencies.
Except that *maybe* the DSEL take care of using the correct set of
intrinsic depending on platform using, I don't know, architecture
detection at compile-time ? And, IIRC the gcc intrinsic are just C like
function over the SIMD assembly function ... so I don't how it can't ...
Then your DSEL is actually a full-blown compiler code generator.  Generating 
"optimal" code is a lot more than just picking instructions.  You have to 
allocate registers, schedule, etc. and that changes not just based on ISA but 
on the implementation of that ISA provided by a particular processor.

Writing a DSEL containing all of this knowledge is much more work than just 
coding the library in asm if the set of libraries is small.

                                         -Dave