Re: [boost] [gsoc] boost.simd news from the front.

11 Jun 2011


      Mathias Gaunard <mathias.gaunard@ens-lyon.org> writes:
...
On 11/06/2011 02:08, David A. Greene wrote:
...
What's the difference between:
ADDPD XMM0, XMM1
and
XMM0 = __builtin_ia32_addpd (XMM0, XMM1)
I would contend nothing, from a programming effort perpective.
Register allocation.
But that's not where the difficult work is.
...
...
...
Currently we support all SSEx familly, all AMD specific stuff and
Altivec for PPC and Cell adn we have a protocol to extend that.
How many different implementations of DGEMM do you have for x86?  I have
seen libraries with 10-20.
That's because they don't have generic programming, which would allow
them to generate all variants with a single generic core and some
meta-programming.
No.  No, no, no.  These implementations are vastly different.  It's not
simply a matter of changing vector lenght.
...
We work with the LAPACK people, and some of them have realized that
the things we do with metaprogramming could be very interesting to
them, but we haven't had any research opportunity to start a project
on this yet.
I'm not saying boost.simd is never useful.  I'm saying the claims made
about it seem overblown.
...
...
- Write it using the operator overloads provided by boost.simd.  Note
   that the programmer will have to take into account various
   combinations of matrix size and alignment, target microarchitecture
   and ISA and will probably have to code many different versions.
Shouldn't you just need the cache line size? This is something we
provide as well.
Nope.  It's a LOT more complicated than that.
...
Ideally you shouldn't need anything else that cannot be made
architecture-agnostic.
What's the right vector length?  That alone depends heavily on the
microarchitecture.  And as I noted above, this is one of the simpler
questions.
...
And as I said, you should make the properties on size (and even
alignment if you really care) a template parameter, so as to be able
to dispatch it to relevant bits at compile-time...
Yes, I can see how that would be useful.  It will cover a lot of cases.
But not everthing.  And that's ok, as long as the library documentation
spells that out.
...
C++ metaprogramming *is* a autotuning framework.
To a degree.  How do you do different loop restructurings using the
library?
...
...
Your rationale, as
I understand it, is to make exploiting data parallelism simpler.
No it isn't.
Its goal is to provide a SIMD abstraction layer. It's an
infrastructure library to build other libraries. It is still fairly
low-level.
Ok, that makes more sense.
...
...
...
...
Intel and PGI.
Ok, what guys on non intel nor PGI supproted machine does ?
Cry blood ?
If boost.simd is targeted to users who have subpar compilers
Other compilers than intel or PGI are subpar compilers? Maybe if you
live in a very secluded world.
No, not every compiler is subpar.  But many are.
...
...
But please don't go around telling people that compilers can't
vectorize and parallelize.  That's simply not true.
Run the trivial accumulate test?
Vectorized.
...
The most little of things can prevents them from vectorizing. Sure, if
you add a few restrict there, a few pragmas elsewhere, some specific
compiling options tied to floating point, you might be able to get the
system to kick in.
Yep.  And that's a LOT easier the hand-restructuring loops and writing
vector code manually.
...
But my personal belief is that automatic parallelization of arbitrary
code is an approach doomed to failure.
Then HPC has been failing for 30 years.
...
Programming is about making things explicit using the right language
for the task.
Programming is about programmer productivity.
...
...
Boost.simd could be useful to vendors providing vectorized versions of
their libraries.
Not all fast libraries need to be provided by hardware vendors.
No, not all.  In most other cases, though, the compiler should do it.
...
...
I have seen too many cases where programmers wrote an "obviously better"
vector implementation of a loop, only to have someone else rewrite it in
scalar so the compiler could properly vectorize it.
Maybe if the compiler was really that good, it could still do the
optimization when vectors are involved?
No, because information has been lost at that point.

                              -Dave

Re: [boost] [gsoc] boost.simd news from the front.

greened＠obbligato.org