
On 10/06/11 18:05, David A. Greene wrote:
For writing new code, I contend that a good compiler, with a few directives here and there, can accomplish the same result as this library and with less programmer effort.
I dont really think so, especially if you take your "portable" stuff in the equation.
A simple example:
void foo(float *a, float *b, int n) { for (int i = 0; i< n; ++i) a[i] = b[i]; }
This is not obviously parallel but with some simple help the user can get the compiler to vectorize it.
Seriously, are you kidding me ? This is a friggin for_all ... You can not get more embarrasingly parallel.
Another less simple case:
And this accumulate, they are like the most basic EP example you can get.
This is much less obviously parallel, but good compilers can make it so if the user allows slightly different answers, which they often do.
Yeah and any brain dead developper can write the proper boost::accumulate( simd::range(v), 0. ) to get it right. So, who's the compiler's daddy here ?
Can you explain why not? Assembly code in and of itself is not bad but it raises some maintainability questions. How many different implementations of a particular ISA will the library support?
Because it is C functions maybe :E Currently we support all SSEx familly, all AMD specific stuff and Altivec for PPC and Cell adn we have a protocol to extend that.
4 floats are available. That does not mean one always wants to use all of them. Heck, it's often the case one wants to use none of them.
Not usign all element in a SIMD vector is Doing It Wrong.
I'm demonstrating what I mean by "performance portable." Substitute "GPU" with any CPU sufficiently different from the baseline.
I wish you read about what a "library scope" and "rationale" mean. Are you complaining that Boost.MPI dont cover GPU too ?
Intel and PGI.
Ok, what guys on non intel nor PGI supproted machine does ? Cry blood ? Someone is trolling someone hard here I suppose ...