[boost] Going forward with Boost.SIMD

18 Apr 2013

      ...
Unfortunately the concurrency/parallelism group has decided that they do
not want C++ to provide types representing SIMD registers.
I'm afraid I don't quite understand the rationale for such a refusal;
proposing more high-level constructs similar to valarray (or to our own
library NT2) was suggested, but that's obviously a more complex and
limited API, not a basic building block to program portably a specific
processor unit.
As much as a refusal can be heartbreaking, trust me when I say that
indecision or indifference is worse. The fact they gave you a definite
refusal is actually one of the least worst outcomes from an ISO standards
proposal, because you can now move on without uncertainty.
...
Development of Boost.SIMD will still proceed, aiming for integration in
Boost, but standardization appears to be definitely out of the question.
Any feedback of the API presented in the proposal is welcome.
<http://open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3571.pdf>
If I were still part of ISO standards, I'd observe the following:

1. GPU and CPU stream computation technologies are still merging. In other
words, it's too soon to standardize this technology lest we accidentally
break some novel form of new convergence. Happy to reconsider post-C++14.

2. It's hard to standardize current CPU SIMD implementations due to
extremely irritating inconsistencies between vendors. For example, a generic
straight port of SSE2 to NEON will have awful performance because SSE2 code
does a lot of flipping between SIMD and non-SIMD, and because NEON is a
coprocessor on ARM that generates poor performance. Another bug bear of mine
on NEON is the lack of an equivalent to _mm_movemask_epi8(), which can be
emulated in about eight NEON instructions, but in so doing you'll make code
which was very speedy on SSE2 pretty slow in most cases on NEON.

My point here is you cannot standardize such non-uniform behavior in a
universally performant way, because you'll just get lowest common
denominator performance across all SIMD implementations which kinda defeats
its purpose. If all vendors were like NEON, or like SSE2, then we still have
to see how CUDA and OpenCL pan out long run.

3. I am unsure if C++ is the appropriate language for SIMD standardization
when perhaps a meta-form of JIT compiled C++ would be much superior (i.e.
you supply LLVM bytecode, and it gets delivered to a GPU/CPU/whatever).
We'll have those on the table with LLVM-type compilers. In other words, I
would vote to wait and see what the market throws up.

I appreciate that none of these three rationale are what you want to hear.
Still, I hope my observations are useful to you. None of them suggests you
shouldn't proceed with Boost.SIMD. Boost has a much wider remit than just as
a testing ground for future C++ standard library features. But I suspect
that if SIMD ever does get standardized, it won't look like your library or
proposal because it will be based on technologies which don't exist yet.

Hope that helps,

Niall

[boost] Going forward with Boost.SIMD

Niall Douglas