
On 10/06/11 15:16, David A. Greene wrote:
I don't think this presentation makes the case for this library. That said, I am very glad you and others are thinking about these problems.
Sorry then
Almost everything the compiler needs to vectorize well that it does not get from most language syntax can be summed up by two concepts: aliasing and alignment.
No. How can a compiler vectorized a function in another binary .o ? Like who is gonna vectorize cos and its ilk ?
I don't see how pack<> addresses the aliasing problem in any way that is not similar to simply grabbing local copies of global data or parameters. Various C++ "restrict" extensions already address the latter. We desperately need something much better than "restrict" in standard C++. Manycore is the future and parallel processing is the new normal.
If you read the slides, you would have seen that pack is like the messenger of the whole sidm range system which is fitting right into *higher level of abstraction* and not some piggy backing of the compiler.
pack<> does address alignment, but it's overkill. It's also pessimistic. One does not always need aligned data to vectorize, so the conditions placed on pack<> are too restrictive. Furthermore, the alignment information pack<> does convey will likely get lost in the depths of the compiler, leading to suboptimal code generation unless that alignment information is available elsewhere (and it often is).
Well, my benchmarks disagree with this. See this old post of mine one year ago about the same subject. If getting 95% of peak performances is pessimistic, then sorry.
I think a far more useful design of this library would be providing standard ways to assert certain conditions. For example:
No. Range that accept SIMD operations are a perfect HL feature. We are writing a library not an extension for compilers.
What's under the operators on pack<>? Is it assembly code?
No as naked assembly prevent proper inlining and other register based compiler optimisation. We use w/e intirnsic is avialable for the current compiler/architecture at hand.
I wonder how pack<T> can know the best vector length. That is highly, highly code- and implementation-dependent.
No. On SSEx machine, SIMD vector are 128 bits, this means pack<T, sizeof(T)/16> is optimal so a simple meta-function finds it.
How does simd::where define pack<> elements of the result where the condition is false? Often the best solution is to leave them undefined but your example seems to require maintaining current values.
This make no sense. False is [0 ... 0] True is [ ~0 ... ~0]. Period. SIMD is all about branchless, so everything is computed in the whole vector. I seems to me you didnt get that pack is NOT a data container but a layer above SIMD registers that then get hidden under concept of ContiguousRange.
How portable is Boost.simd? By portable I mean, how easy is it to move the code from one machine to another get the same level of performance?
Works on gcc, msvc, sse and altivec, and we started looking at ARM NEON. Most of these have the same level of performance
I don't mean to be too discouraging. But a library to do this kind of stuff seems archaic to me. It was archaic when Intel introduced MMX. If possible, I would like to see this evolve into a library to convey information to the compiler.
I'll keep my archaic stuff giving me a x4-x8 speed up rather than waiting for compiler based solution nobody were able to give me since 1999 ... We already had this discussion two years ago, so i am not keen to go all over again as it clearly seems you are just retelling the same FUD that last time.