Re: [boost] [GSOC]SIMD Library

1 Apr 2011

      On Fri, Apr 1, 2011 at 1:01 AM, Mathias Gaunard
<mathias.gaunard@ens-lyon.org> wrote:
...
On 01/04/2011 09:20, Joel Falcou wrote:
...
...
Is there a way to detect those compile options?
Yes, but the equivalent options don't exist on MSVC.
The problem is that the two compilers have radically different ways to deal
with this.
SIMD in general is a tough problem, especially if you're aiming for
the user to write generic SIMD code without having to worry about it
compiling down to SSE, AVX, NEON, etc.!  I think that will be
impractical if you want good performance, but here's how I'd imagine a
try would look like:

template<typename InstructionSet>
struct kernel
{
void operator()(float const *f) { typedef simd::vec<4, InstructionSet>
vec4; ... }
};

std::function<void(float const*)> func(simd::generate<kernel>());

Where generate() instantiates kernel for all available instruction
sets on VC++, using cpuid at runtime to pick which one to return, and
probably a single instruction set on GCC.

VC++ is almost never targeting the compile machine, and it is very
common for SIMD-optimized apps on Windows to use cpuid to select code
paths at runtime.  I don't see any other way short of separate
binaries which will be distribution (and probably compile) hell.

Note this would also let the user specialize the kernel for different
instruction sets if they wanted to.  Like I said before, using one
generic algorithm for multiple instruction sets is probably not going
to give anywhere near the performance of hand-written intrinsics,
though it might still be faster than plain C++.  It would provide an
optimization point.

A problem with this design is that a lot of times an algorithm will
only need SSE2 and not use any new instructions when you instantiate
it with SSSE3.  I'm not certain how one would cleanly solve this.
Maybe an optional mpl::map for generate() that maps things like
ssse3->sse2?

-- 
Cory Nelson
http://int64.org