
Mathias Gaunard <mathias.gaunard@ens-lyon.org> writes:
On 11/06/2011 02:08, David A. Greene wrote:
What's the difference between:
ADDPD XMM0, XMM1
and
XMM0 = __builtin_ia32_addpd (XMM0, XMM1)
I would contend nothing, from a programming effort perpective.
Register allocation.
But that's not where the difficult work is.
Currently we support all SSEx familly, all AMD specific stuff and Altivec for PPC and Cell adn we have a protocol to extend that.
How many different implementations of DGEMM do you have for x86? I have seen libraries with 10-20.
That's because they don't have generic programming, which would allow them to generate all variants with a single generic core and some meta-programming.
No. No, no, no. These implementations are vastly different. It's not simply a matter of changing vector lenght.
We work with the LAPACK people, and some of them have realized that the things we do with metaprogramming could be very interesting to them, but we haven't had any research opportunity to start a project on this yet.
I'm not saying boost.simd is never useful. I'm saying the claims made about it seem overblown.
- Write it using the operator overloads provided by boost.simd. Note that the programmer will have to take into account various combinations of matrix size and alignment, target microarchitecture and ISA and will probably have to code many different versions.
Shouldn't you just need the cache line size? This is something we provide as well.
Nope. It's a LOT more complicated than that.
Ideally you shouldn't need anything else that cannot be made architecture-agnostic.
What's the right vector length? That alone depends heavily on the microarchitecture. And as I noted above, this is one of the simpler questions.
And as I said, you should make the properties on size (and even alignment if you really care) a template parameter, so as to be able to dispatch it to relevant bits at compile-time...
Yes, I can see how that would be useful. It will cover a lot of cases. But not everthing. And that's ok, as long as the library documentation spells that out.
C++ metaprogramming *is* a autotuning framework.
To a degree. How do you do different loop restructurings using the library?
Your rationale, as I understand it, is to make exploiting data parallelism simpler.
No it isn't. Its goal is to provide a SIMD abstraction layer. It's an infrastructure library to build other libraries. It is still fairly low-level.
Ok, that makes more sense.
Intel and PGI.
Ok, what guys on non intel nor PGI supproted machine does ? Cry blood ?
If boost.simd is targeted to users who have subpar compilers
Other compilers than intel or PGI are subpar compilers? Maybe if you live in a very secluded world.
No, not every compiler is subpar. But many are.
But please don't go around telling people that compilers can't vectorize and parallelize. That's simply not true.
Run the trivial accumulate test?
Vectorized.
The most little of things can prevents them from vectorizing. Sure, if you add a few restrict there, a few pragmas elsewhere, some specific compiling options tied to floating point, you might be able to get the system to kick in.
Yep. And that's a LOT easier the hand-restructuring loops and writing vector code manually.
But my personal belief is that automatic parallelization of arbitrary code is an approach doomed to failure.
Then HPC has been failing for 30 years.
Programming is about making things explicit using the right language for the task.
Programming is about programmer productivity.
Boost.simd could be useful to vendors providing vectorized versions of their libraries.
Not all fast libraries need to be provided by hardware vendors.
No, not all. In most other cases, though, the compiler should do it.
I have seen too many cases where programmers wrote an "obviously better" vector implementation of a loop, only to have someone else rewrite it in scalar so the compiler could properly vectorize it.
Maybe if the compiler was really that good, it could still do the optimization when vectors are involved?
No, because information has been lost at that point. -Dave