
On Fri, Apr 1, 2011 at 1:01 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 01/04/2011 09:20, Joel Falcou wrote:
Is there a way to detect those compile options?
Yes, but the equivalent options don't exist on MSVC.
The problem is that the two compilers have radically different ways to deal with this.
SIMD in general is a tough problem, especially if you're aiming for the user to write generic SIMD code without having to worry about it compiling down to SSE, AVX, NEON, etc.! I think that will be impractical if you want good performance, but here's how I'd imagine a try would look like: template<typename InstructionSet> struct kernel { void operator()(float const *f) { typedef simd::vec<4, InstructionSet> vec4; ... } }; std::function<void(float const*)> func(simd::generate<kernel>()); Where generate() instantiates kernel for all available instruction sets on VC++, using cpuid at runtime to pick which one to return, and probably a single instruction set on GCC. VC++ is almost never targeting the compile machine, and it is very common for SIMD-optimized apps on Windows to use cpuid to select code paths at runtime. I don't see any other way short of separate binaries which will be distribution (and probably compile) hell. Note this would also let the user specialize the kernel for different instruction sets if they wanted to. Like I said before, using one generic algorithm for multiple instruction sets is probably not going to give anywhere near the performance of hand-written intrinsics, though it might still be faster than plain C++. It would provide an optimization point. A problem with this design is that a lot of times an algorithm will only need SSE2 and not use any new instructions when you instantiate it with SSSE3. I'm not certain how one would cleanly solve this. Maybe an optional mpl::map for generate() that maps things like ssse3->sse2? -- Cory Nelson http://int64.org