
I'm still working on a potential Boost.SIMD proposal despite the apparent lack of interest by the list. Last discussion spawned the fact that actual performances figures may be interesting. So here's some (see end of mail). This table show for a subset of non-trivial functions the cycles needed to compute one value in scalar, using SSE2, some precision concerns and the actual speed-up. Most of them are super-linear because either : 1/ libc algorithm is badly implemented, or 2/ non-SIMD architectural difference between SSE2 FPU and scalar FPU leads to additional speed-up Most transcendental functions used a SIMD evrsion of the old yet useful Cephes C library based around various polynomial estimations. Results on Altivec processor are roughly the same except for transcendental where the use of a proper FMA instead of sequence of mul-add increases performances.For trivial function like +,-,*,/, I was happily surprised that indeed gcc is able to generate SIMD code. Alas, gcc auto-SIMD speed-up never exceed 2.54 while our code can go up to 3.5. Concerning the problem of interface and support of odd-ball vector size in a platform independant fashion, we use the remark of Matthias and provide a vec<T,C> class in which the vector cardinal can be speficified (and is equal to the native cardinal of said type by default). Things like vec<double,5> are handled as boost::array and provide same interface and set of functions. for any given functions, it cna be applied either to any vec<T,C> types or any native SIMD type (__m128 in SSEx or vector xxx in Altivec). Syntaxic sugar like v = v+4 is provided and perform consatnt splatting before SIMD evaluation. This still has to be boostified and made independant of the whole project it depends on. Once done, a preliminary version will be uploaded into the Vault. Current target architecture are : - SSE2, SSSE3,SSE3 - AltiVec for PPC and Cell processor (a patched version of boost is needed) Comments and questions welcomed. || -------------- || --------------- || scalar || -------------- vector ---------------- || ---- || || || || cycles ||cycles | ulp | rms | peak || s-up || || --------------------------------------------------------------------------------------------- || || abs_ || float || 2.0 || 0.8 | 0 | 0.00e+00 | 0.00e+00 || 2.4 || || acosh_ || float || 148.2 || 30.2 | 1 | 9.15e-09 | 1.19e-07 || 4.9 || || acos_ || float || 261.8 || 14.7 | 3 | 7.01e-08 | 2.38e-07 || 17.8 || || arg_ || float || 5.0 || 1.2 | 0 | 0.00e+00 | 0.00e+00 || 4.2 || || asinh_ || float || 152.8 || 32.4 | 1 | 1.22e-08 | 1.19e-07 || 4.7 || || asin_ || float || 256.5 || 11.6 | 2 | 5.32e-08 | 2.28e-07 || 22.1 || || atanh_ || float || 123.9 || 20.4 | 2 | 2.27e-08 | 4.55e-07 || 6.1 || || atan_ || float || 160.7 || 12.7 | 1 | 3.55e-08 | 6.74e-08 || 12.7 || || bitofsign_ || float || 5.1 || 0.8 | 0 | 0.00e+00 | 0.00e+00 || 6.1 || || boolean_ || float || 5.4 || 1.0 | 0 | 0.00e+00 | 0.00e+00 || 5.4 || || cbrt_ || float || 152.5 || 39.7 | 1 | 2.76e-08 | 7.77e-08 || 3.8 || || ceil_ || float || 16.6 || 2.8 | 0 | 0.00e+00 | 0.00e+00 || 5.9 || || cosh_ || float || 211.5 || 19.1 | 2 | 4.00e-08 | 1.83e-07 || 11.1 || || cos_ || float || 112.2 || 14.6 | 1 | 2.98e-08 | 1.11e-07 || 7.7 || || cospi_ || float || 103.6 || 12.1 | 1 | 3.43e-08 | 1.19e-07 || 8.6 || || cot_ || float || 142.8 || 17.8 | 3 | 5.54e-08 | 2.38e-07 || 8.0 || || cotpi_ || float || 142.1 || 17.1 | 6 | 9.62e-08 | 4.08e-07 || 8.3 || || exp10_ || float || 169.3 || 32.1 | 1 | 2.88e-08 | 1.19e-07 || 5.3 || || exp_ || float || 171.3 || 19.3 | 1 | 2.60e-08 | 1.19e-07 || 8.9 || || expm1_ || float || 294.1 || 42.6 | 3 | 2.89e-08 | 1.94e-07 || 6.9 || || floor_ || float || 16.9 || 2.7 | 0 | 0.00e+00 | 0.00e+00 || 6.4 || || gd_ || float || 602.4 || 35.8 | 3 | 3.93e-08 | 2.46e-07 || 16.8 || || indeg_ || float || 2.2 || 0.8 | 0 | 2.59e-08 | 5.94e-08 || 2.7 || || inrad_ || float || 2.2 || 0.8 | 0 | 2.53e-08 | 5.94e-08 || 2.6 || || iseqz_ || float || 5.5 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 6.3 || || iseven_ || float || 45.0 || 2.3 | 0 | 0.00e+00 | 0.00e+00 || 19.9 || || isfin_ || float || 5.9 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 6.6 || || isflint_ || float || 38.3 || 1.8 | 0 | 0.00e+00 | 0.00e+00 || 21.4 || || isgez_ || float || 6.0 || 0.8 | 0 | 0.00e+00 | 0.00e+00 || 7.4 || || isgtz_ || float || 6.0 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 6.5 || || isinf_ || float || 6.0 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 6.5 || || islez_ || float || 5.0 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 5.8 || || isltz_ || float || 5.0 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 5.8 || || isnan_ || float || 3.0 || 0.8 | 0 | 0.00e+00 | 0.00e+00 || 3.6 || || isnegative_ || float || 5.6 || 1.1 | 0 | 0.00e+00 | 0.00e+00 || 5.0 || || isnez_ || float || 5.4 || 0.9 | 0 | 0.00e+00 | 0.00e+00 || 6.2 || || isnotfinite_ || float || 3.0 || 0.8 | 0 | 0.00e+00 | 0.00e+00 || 3.6 || || isodd_ || float || 47.8 || 2.5 | 0 | 0.00e+00 | 0.00e+00 || 18.8 || || ispositive_ || float || 5.6 || 1.3 | 0 | 0.00e+00 | 0.00e+00 || 4.2 || || log10abs_ || float || 107.5 || 17.3 | 2 | 6.59e-08 | 2.12e-07 || 6.2 || || log10_ || float || 105.4 || 16.9 | 2 | 6.58e-08 | 2.12e-07 || 6.2 || || log1p_ || float || 149.7 || 18.8 | 1 | 9.16e-09 | 1.19e-07 || 8.0 || || log2abs_ || float || 107.5 || 17.2 | 1 | 1.89e-08 | 1.19e-07 || 6.3 || || log2_ || float || 105.5 || 17.1 | 4 | 1.90e-08 | 2.51e-07 || 6.2 || || logabs_ || float || 107.5 || 23.6 | 1 | 4.36e-08 | 1.19e-07 || 4.6 || || log_ || float || 108.3 || 15.4 | 1 | 9.12e-09 | 1.19e-07 || 7.0 || || mantissa_ || float || 22.1 || 5.0 | 0 | 0.00e+00 | 0.00e+00 || 4.4 || || oneminus_ || float || 3.3 || 0.8 | 0 | 5.16e-10 | 5.96e-08 || 4.0 || || oneplus_ || float || 3.4 || 0.8 | 0 | 2.87e-10 | 5.74e-08 || 4.1 || || rec_ || float || 37.4 || 4.3 | 0 | 2.48e-08 | 5.92e-08 || 8.8 || || round_ || float || 43.7 || 5.5 | 0 | 0.00e+00 | 0.00e+00 || 8.0 || || rsqrt_ || float || 105.4 || 11.3 | 1 | 3.62e-08 | 8.81e-08 || 9.3 || || signedbool_ || float || 9.3 || 0.9 | 0 | nan | 0.00e+00 || 10.6 || || sign_ || float || 12.1 || 2.5 | 0 | 0.00e+00 | 0.00e+00 || 4.9 || || signnz_ || float || 12.1 || 1.4 | 0 | 0.00e+00 | 0.00e+00 || 8.5 || || sinh_ || float || 267.2 || 19.1 | 3 | 2.38e-07 | 3.84e-07 || 14.0 || || sin_ || float || 110.5 || 17.0 | 1 | 2.98e-08 | 1.12e-07 || 6.5 || || sinpi_ || float || 115.8 || 14.4 | 1 | 2.98e-08 | 1.10e-07 || 8.0 || || sqr_ || float || 2.0 || 0.7 | 0 | 2.53e-08 | 5.94e-08 || 2.9 || || sqrt_ || float || 68.3 || 7.1 | 0 | 2.62e-08 | 5.96e-08 || 9.6 || || sqrtabs_ || float || 68.3 || 7.0 | 0 | 2.61e-08 | 5.96e-08 || 9.7 || || tanh_ || float || 206.4 || 20.8 | 197 | 1.63e-07 | 1.77e-05 || 9.9 || || tan_ || float || 153.0 || 17.8 | 2 | 4.18e-08 | 1.45e-07 || 8.6 || || tanpi_ || float || 156.2 || 18.0 | 2 | 4.16e-08 | 1.48e-07 || 8.7 || || --------------------------------------------------------------------------------------------- || -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35