
On 10/06/11 17:09, David A. Greene wrote:
It's not a high level of abstraction. It's a very low level one. Users are barely willing to restructure loops to enable vectorization. Many will be unwilling to rewrite them completely. On the other hand, the data show that they are quite willing to add directives here and there.
If range are not higher level than for loop, I think we can stop discussing right here.
On what code? It's quite easy to achieve that on something like a DGEMM. DGEMM is also an embarrassingly vectorizable code.
Give me one example of non-EP code which needs and can be vectorized.
That's effectively assembly code.
No.
No. On SSEx machines, a vector of 32-bit floats can have 1, 2, 3 or 4 elements.
No, SSE2 __m128 contains 4 floats. Period.
Consider AVX. This is _not_ an easy problem to solve. It is not always the right answer to vectorize using the fully available vector length.
AVX has 256 bits register and fits 8 floats. Again, what did I miss ?
I know what a pack<> is. Perhaps I wasn't clear. If I have an operation (say, negation) under where() in which the even condition elements are true and the odd condition elements are false, what is the produced result for the odd elements of the result vector?
where is ?:. It requires three argument. I tempted to say RTFM. a = c ? b; is not valid code, so neither is where(c,a); The more it goes and the more it looks like you didnt read the slides ... really.
What happens if you move the code from Nehalem to Barcelona? How about from an NVIDIA GPU to Nehalem?
Where did I say this stuff targeted GPU. This is a friggin strawman there. We address in-CPU vectorization, this is the scope of the library. Period again. We dont claim solving arbitrary data parallelism problem and we never did. You are again recycling the same non argument than in your last intervention on this very topic last year.
Compilers have been doing this since the '70's. gcc is not an adequate compiler in this respect, but it is slowly getting there.
MSVC does not, neither xlC ... neither clang ... so which compilers takes random crap C code and vectorize it automagically ?
It's not FUD. It's my experience.
It is really, FUD and strawmen.