Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives

11 Oct 2010

      On 11/10/2010 17:54, DE wrote:
...
hi there
if you focus on simd-aware - and more specifically x86 simd -
implementation _DON'T_ read further
consider OpenCL as a general way to speed up computations (which uses
simd as one of backends as well as gpu shader units etc.)
i think it will be much more general and useful
OpenCL is a possible implementation, albeit we choose to call the 
various SIMD instructions ourselves to have more control on the 
toolchain and the end result.
OpenCL will however be our main backend for GPU targets which we will be 
supporting in the future. Or maybe we will just target it through Clang 
and LLVM.

NT2 (also know as the crazy frenchman library), upon which this effort 
is based, only supports x86 (SSE, ..., SSE4, AVX) and PowerPC (AltiVec). 
ARM (NEON, VFP) is being added.
An effort has been made in its design so that instructions could 
register themselves for a particular (type, cardinal) pair, all of which 
ranked according to a category to select the best candidate. It heavily 
uses meta-programming, including rewritten bits of MPL to augment its 
compile-time performance.
It also uses expression templates with Proto to detect certain patterns, 
such as fused multiply-add, for which x86 is introducing new 
instructions soon.
Therefore it is very tunable and extensible.

The bit I wanted to discuss here, however, is not the implementation, 
but rather the interface that the library provides to the programmer.

We aim to provide an interface in modern C++ that integrates well with 
the standard library and Boost in order to allow developers to make use 
of SIMD in an easy and fairly high-level fashion, potentially using 
meta-programming to write an algorithm with parametric register and 
cache sizes.
OpenCL is not an interface that satisfies those criteria.

Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives

Mathias Gaunard