
Joel falcou <joel.falcou@gmail.com> writes:
On 11/06/11 10:45, David A. Greene wrote:
Vectorizing compilers exist today. They've existing since the 1970's.
I still thinking you're mixing SIMD ISA and vector machine ...
They are the same, though existing SIMD architectures are less powerful.
As I see it, Vector machine as in http://en.wikipedia.org/wiki/SIMD#Chronology existed since 197x. BUT, what you purposely fake to not understand is that we don't care about this type of machine and focus on this http://en.wikipedia.org/wiki/SIMD#Hardware which exists since 1994 or such and for which the use cases, idioms and techniques are completely different.
Not completely. They are different in the sense that the SIMD hardware doesn't provide as many facilities as the old vector machines. But the principles are exactly the same. Vector codegen is harder on the SIMD machines and it's for this reason that I think boost.simd may not always generate the best code. It's really, really not always obvious what instructions should implement an arbitrary expression.
So yeah, autovectorization on huge CRAY system is done automatically with results I really don't know about and for which i dont care as it is not our target audience but I concede it is maybe good or w/e.
The very same compiler vectorizes very well for x86.
Now, strictly speaking on SIMD ISA in x86/PPC familly, no, auto-vectorizing is not that good and still require manual input, functions library and so forth. So in this case, our claims hold and boost.simd has to been has a set of enabling tools for helping people writing generic code able to be vectorized.
There are very good autovectorizors for x86. I'm not very familiar with PPC so I can't comment on that.
Learn that Template Meta-Programming as we use it there don't prevent compiler to do stuff afterward our code generation process. They usually do (inlining or loop unrolling etc ...) and we let them do so as they wish cause they fill the gaps we can not access with our library.
But the resulting vector code may run suboptimally. It will probably run fine 80% of the time, but for that other 20% it may be very bad indeed. This is where allowing the compiler to do vector codegen is critical.
Did you ever go to the last slides where we have the Range based functions code where no SIMD stuff leak but yet is able to eat up SIMDRange ?
Yes, I think it's very neat!
I fail to see how this is *not* a correct way of designing code in C++ : - relying on range abstraction + providing a Range with proper properties. And this kind of code is largely sufficent and generic to handle most classical use cases with high level of performances. Dealing with microarchitecture never went that far as manual code goes, and I don't really get your obsession with that.
My concern with it comes from experience tuning a vectorizing compiler to many different microarchitectures. If the range abstraction is correct and the underlying libraries provide hints to the compiler (the latter is a big missing piece, I admit), the compiler ought to be able to generate the same or better vector code from the high-level algoritm call. If it doesn't, it's because the compiler is not up to the task or the generic library somehow hides information from the compiler. I contend that it is often easier to add directives to the library than it is to use boost.simd to generate vector code, and the former will often perform better. If one doesn't have a good vectorizing compiler, the point is moot because the directives probably don't even exist. In that case boost.simd is the way to go. My pithy quip to colleagues is that std::vector<> ought to vectorize. :) It likely doesn't today, at least not with the default allocator. But that is largely a problem with the library implementation not specifying the lack of aliasing issues to the compiler and/or the higher-level standard algorithms not including directives to tell the compiler to ignore possible depedencies. Does that mean that boost.simd is a waste of time? Not at all! There will always be cases the compiler for whatever reason cannot get. In those cases, boost.simd is the far superior way to approach the problem over hand-written vector code. -Dave