[boost] Back to Boost.SIMD - Some performances ...

26 Mar 2009

      I'm still working on a potential Boost.SIMD proposal despite the 
apparent  lack of interest by the list. Last discussion spawned the fact 
that actual performances figures may be interesting. So here's some (see 
end of mail). This table show for a subset of non-trivial functions the 
cycles needed to compute one value in scalar, using SSE2, some precision 
concerns and the actual speed-up.

Most of them are super-linear because either :
1/ libc algorithm is badly implemented, or
2/ non-SIMD architectural difference between SSE2 FPU and scalar FPU 
leads to additional speed-up

Most transcendental functions used a SIMD evrsion of the old yet useful 
Cephes C library based around various polynomial estimations.
Results on Altivec processor are roughly the same except for 
transcendental where the use of a proper FMA instead of sequence of 
mul-add increases performances.For trivial function like +,-,*,/, I was 
happily surprised that indeed gcc is able to generate SIMD code. Alas, 
gcc auto-SIMD speed-up never exceed 2.54 while our code can go up to 3.5.

Concerning the problem of interface and support of odd-ball vector size 
in a platform independant fashion, we use the remark of Matthias and 
provide a vec<T,C> class in which the vector cardinal can be speficified 
(and is equal to the native cardinal of said type by default). Things like
vec<double,5> are handled as boost::array and provide same interface and 
set of functions. for any given functions, it cna be applied either to 
any vec<T,C> types or any native SIMD type (__m128 in SSEx or vector xxx 
in Altivec). Syntaxic sugar like v = v+4 is provided and perform 
consatnt splatting before SIMD evaluation.

This still has to be boostified and made independant of the whole 
project it depends on.
Once done, a preliminary version will be uploaded into the Vault. 
Current target architecture are :
- SSE2, SSSE3,SSE3
- AltiVec for PPC and Cell processor (a patched version of boost is needed)

Comments and questions welcomed.

|| -------------- || --------------- || scalar || -------------- vector 
---------------- || ---- ||
||                ||                 || cycles ||cycles |      ulp 
|      rms |     peak || s-up ||
|| 
--------------------------------------------------------------------------------------------- 
||
||           abs_ ||           float ||    2.0 ||   0.8 |        0 | 
0.00e+00 | 0.00e+00 ||  2.4 ||
||         acosh_ ||           float ||  148.2 ||  30.2 |        1 | 
9.15e-09 | 1.19e-07 ||  4.9 ||
||          acos_ ||           float ||  261.8 ||  14.7 |        3 | 
7.01e-08 | 2.38e-07 || 17.8 ||
||           arg_ ||           float ||    5.0 ||   1.2 |        0 | 
0.00e+00 | 0.00e+00 ||  4.2 ||
||         asinh_ ||           float ||  152.8 ||  32.4 |        1 | 
1.22e-08 | 1.19e-07 ||  4.7 ||
||          asin_ ||           float ||  256.5 ||  11.6 |        2 | 
5.32e-08 | 2.28e-07 || 22.1 ||
||         atanh_ ||           float ||  123.9 ||  20.4 |        2 | 
2.27e-08 | 4.55e-07 ||  6.1 ||
||          atan_ ||           float ||  160.7 ||  12.7 |        1 | 
3.55e-08 | 6.74e-08 || 12.7 ||
||     bitofsign_ ||           float ||    5.1 ||   0.8 |        0 | 
0.00e+00 | 0.00e+00 ||  6.1 ||
||       boolean_ ||           float ||    5.4 ||   1.0 |        0 | 
0.00e+00 | 0.00e+00 ||  5.4 ||
||          cbrt_ ||           float ||  152.5 ||  39.7 |        1 | 
2.76e-08 | 7.77e-08 ||  3.8 ||
||          ceil_ ||           float ||   16.6 ||   2.8 |        0 | 
0.00e+00 | 0.00e+00 ||  5.9 ||
||          cosh_ ||           float ||  211.5 ||  19.1 |        2 | 
4.00e-08 | 1.83e-07 || 11.1 ||
||           cos_ ||           float ||  112.2 ||  14.6 |        1 | 
2.98e-08 | 1.11e-07 ||  7.7 ||
||         cospi_ ||           float ||  103.6 ||  12.1 |        1 | 
3.43e-08 | 1.19e-07 ||  8.6 ||
||           cot_ ||           float ||  142.8 ||  17.8 |        3 | 
5.54e-08 | 2.38e-07 ||  8.0 ||
||         cotpi_ ||           float ||  142.1 ||  17.1 |        6 | 
9.62e-08 | 4.08e-07 ||  8.3 ||
||         exp10_ ||           float ||  169.3 ||  32.1 |        1 | 
2.88e-08 | 1.19e-07 ||  5.3 ||
||           exp_ ||           float ||  171.3 ||  19.3 |        1 | 
2.60e-08 | 1.19e-07 ||  8.9 ||
||         expm1_ ||           float ||  294.1 ||  42.6 |        3 | 
2.89e-08 | 1.94e-07 ||  6.9 ||
||         floor_ ||           float ||   16.9 ||   2.7 |        0 | 
0.00e+00 | 0.00e+00 ||  6.4 ||
||            gd_ ||           float ||  602.4 ||  35.8 |        3 | 
3.93e-08 | 2.46e-07 || 16.8 ||
||         indeg_ ||           float ||    2.2 ||   0.8 |        0 | 
2.59e-08 | 5.94e-08 ||  2.7 ||
||         inrad_ ||           float ||    2.2 ||   0.8 |        0 | 
2.53e-08 | 5.94e-08 ||  2.6 ||
||         iseqz_ ||           float ||    5.5 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  6.3 ||
||        iseven_ ||           float ||   45.0 ||   2.3 |        0 | 
0.00e+00 | 0.00e+00 || 19.9 ||
||         isfin_ ||           float ||    5.9 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  6.6 ||
||       isflint_ ||           float ||   38.3 ||   1.8 |        0 | 
0.00e+00 | 0.00e+00 || 21.4 ||
||         isgez_ ||           float ||    6.0 ||   0.8 |        0 | 
0.00e+00 | 0.00e+00 ||  7.4 ||
||         isgtz_ ||           float ||    6.0 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  6.5 ||
||         isinf_ ||           float ||    6.0 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  6.5 ||
||         islez_ ||           float ||    5.0 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  5.8 ||
||         isltz_ ||           float ||    5.0 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  5.8 ||
||         isnan_ ||           float ||    3.0 ||   0.8 |        0 | 
0.00e+00 | 0.00e+00 ||  3.6 ||
||    isnegative_ ||           float ||    5.6 ||   1.1 |        0 | 
0.00e+00 | 0.00e+00 ||  5.0 ||
||         isnez_ ||           float ||    5.4 ||   0.9 |        0 | 
0.00e+00 | 0.00e+00 ||  6.2 ||
||   isnotfinite_ ||           float ||    3.0 ||   0.8 |        0 | 
0.00e+00 | 0.00e+00 ||  3.6 ||
||         isodd_ ||           float ||   47.8 ||   2.5 |        0 | 
0.00e+00 | 0.00e+00 || 18.8 ||
||    ispositive_ ||           float ||    5.6 ||   1.3 |        0 | 
0.00e+00 | 0.00e+00 ||  4.2 ||
||      log10abs_ ||           float ||  107.5 ||  17.3 |        2 | 
6.59e-08 | 2.12e-07 ||  6.2 ||
||         log10_ ||           float ||  105.4 ||  16.9 |        2 | 
6.58e-08 | 2.12e-07 ||  6.2 ||
||         log1p_ ||           float ||  149.7 ||  18.8 |        1 | 
9.16e-09 | 1.19e-07 ||  8.0 ||
||       log2abs_ ||           float ||  107.5 ||  17.2 |        1 | 
1.89e-08 | 1.19e-07 ||  6.3 ||
||          log2_ ||           float ||  105.5 ||  17.1 |        4 | 
1.90e-08 | 2.51e-07 ||  6.2 ||
||        logabs_ ||           float ||  107.5 ||  23.6 |        1 | 
4.36e-08 | 1.19e-07 ||  4.6 ||
||           log_ ||           float ||  108.3 ||  15.4 |        1 | 
9.12e-09 | 1.19e-07 ||  7.0 ||
||      mantissa_ ||           float ||   22.1 ||   5.0 |        0 | 
0.00e+00 | 0.00e+00 ||  4.4 ||
||      oneminus_ ||           float ||    3.3 ||   0.8 |        0 | 
5.16e-10 | 5.96e-08 ||  4.0 ||
||       oneplus_ ||           float ||    3.4 ||   0.8 |        0 | 
2.87e-10 | 5.74e-08 ||  4.1 ||
||           rec_ ||           float ||   37.4 ||   4.3 |        0 | 
2.48e-08 | 5.92e-08 ||  8.8 ||
||         round_ ||           float ||   43.7 ||   5.5 |        0 | 
0.00e+00 | 0.00e+00 ||  8.0 ||
||         rsqrt_ ||           float ||  105.4 ||  11.3 |        1 | 
3.62e-08 | 8.81e-08 ||  9.3 ||
||    signedbool_ ||           float ||    9.3 ||   0.9 |        0 
|      nan | 0.00e+00 || 10.6 ||
||          sign_ ||           float ||   12.1 ||   2.5 |        0 | 
0.00e+00 | 0.00e+00 ||  4.9 ||
||        signnz_ ||           float ||   12.1 ||   1.4 |        0 | 
0.00e+00 | 0.00e+00 ||  8.5 ||
||          sinh_ ||           float ||  267.2 ||  19.1 |        3 | 
2.38e-07 | 3.84e-07 || 14.0 ||
||           sin_ ||           float ||  110.5 ||  17.0 |        1 | 
2.98e-08 | 1.12e-07 ||  6.5 ||
||         sinpi_ ||           float ||  115.8 ||  14.4 |        1 | 
2.98e-08 | 1.10e-07 ||  8.0 ||
||           sqr_ ||           float ||    2.0 ||   0.7 |        0 | 
2.53e-08 | 5.94e-08 ||  2.9 ||
||          sqrt_ ||           float ||   68.3 ||   7.1 |        0 | 
2.62e-08 | 5.96e-08 ||  9.6 ||
||       sqrtabs_ ||           float ||   68.3 ||   7.0 |        0 | 
2.61e-08 | 5.96e-08 ||  9.7 ||
||          tanh_ ||           float ||  206.4 ||  20.8 |      197 | 
1.63e-07 | 1.77e-05 ||  9.9 ||
||           tan_ ||           float ||  153.0 ||  17.8 |        2 | 
4.18e-08 | 1.45e-07 ||  8.6 ||
||         tanpi_ ||           float ||  156.2 ||  18.0 |        2 | 
4.16e-08 | 1.48e-07 ||  8.7 ||
|| 
--------------------------------------------------------------------------------------------- 
||

-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35