
On Wednesday 21 January 2009 01:30, Joel Falcou wrote:
David A. Greene a écrit :
A library of fast routines for doing various things is quite different from creating a whole DSEL to do SIMD code generation.
How a DSEL can be different from a library still puzzle me as the basic definition of a DSEL is a DSL embedded into a host language as a library.
Implementing a DSEL to do code generation is a LOT more work than simply coding a fast library in asm. If you want to generate SIMD code for lots of libraries than a DSEL might be worth it, but I'm talking about specialized applications here (matrix multiply, etc.).
A library of fast matrix mutliply, etc. would indeed be useful.
You mean, useful like being said weeks in advance that it's useless cause uBlas already do it as it was said earlier ? And if, as you said compilers already do what it's needed, then I call this useless too cause we'll just wait that all compiler do the same ...
I'm talking about specific routines tuned in a way that a general-purpose compiler would not be able to replicate. It's a very small set of codes.
It strikes me that writing these routines using gcc intrinsics wouldn't result in optimal code on all architectures. Similarly, it seems that a DSEL to do the same would have similar deficiencies.
Except that *maybe* the DSEL take care of using the correct set of intrinsic depending on platform using, I don't know, architecture detection at compile-time ? And, IIRC the gcc intrinsic are just C like function over the SIMD assembly function ... so I don't how it can't ...
Then your DSEL is actually a full-blown compiler code generator. Generating "optimal" code is a lot more than just picking instructions. You have to allocate registers, schedule, etc. and that changes not just based on ISA but on the implementation of that ISA provided by a particular processor. Writing a DSEL containing all of this knowledge is much more work than just coding the library in asm if the set of libraries is small. -Dave