Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)

21 Jan 2009

      On Tuesday 20 January 2009 18:51, David Abrahams wrote:
...
...
...
Why are these SIMD operations different in that respect from, say, large
matrix multiplications?
A matrix multiplication is a higher-level construct.  Still, most
compilers will pattern-match matrix multiplication to an optimal routine.
Not hardly.  No compiler is going to introduce register and cache-level
blocking.
I'm not sure what you mean here.  Compilers do blocking all the time.

Typically, the compiler will match a matrix multiply (and a number of other
patterns) to library code that has been pre-tuned.  Typically the library code 
has a number of possible paths based on the size of the matrices, etc.  Some
of those paths may be blocked at several different levels.  Or not, if that 
gives better performance.

There's a rich amount of research going on about how to auto-tune library code 
for just such purposes.
...
...
SIMD code generation is extremely low-level.
Programmers want to think in a
higher level.
Naturally.  But are the algorithms implemented by SIMD instructions
lower-level than std::for_each or std::accumulate?  If not, maybe they
deserve to be in a library.
A library of fast routines for doing various things is quite different from 
creating a whole DSEL to do SIMD code generation.  A library of fast matrix 
mutliply, etc. would indeed be useful.

How much does Boost want to concern itself with providing libraries tuned with 
asm routines for various architectures?

It strikes me that writing these routines using gcc intrinsics wouldn't result 
in optimal code on all architectures.  Similarly, it seems that a DSEL to do 
the same would have similar deficiencies.

When you're talking "optimal," you're setting a pretty dang high bar.

                                            -Dave

Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)

David A. Greene