Re: [boost] performance of a linear algebra/matrix library

11 May 2010

      for those who still care

i investigated an "issue" which i claimed was due to abstraction
penalty
it seemed to me the error was in the source code...
indeed it was in the code... of my DNA

it turned out the performance differnece of 33% is due to loop
unrolling in the C code while the loop in the C++ code was not
unrolled
in other aspects the two loops were identical (abstraction in the C++
code completely optimized away)
so it's not the C++ code that run slow but the optimized C code that
run faster
i unrolled the loop in the C++ code manually and the two started to
run in the same time (it seems now it was a cpu pipeline issue)

furthermore i looked at icc11 generetaed assembly code and was shocked
icc not only optimized the abstraction away but also unrolled both
loops (the C and C++ ones) AND vectorized them
that is both plain C and C++ pieces of code were transformed into
instruction sequences like

  movsd
  movhpd
  mulpd
  movapd
  //etc.

as a result icc generated code ran 15% faster than both msvc80 and
msvc10 verions

here a question arises:
since a compiler is able to generate very fast code involving simd
instructions is one supposed to provide simd-enabled implementation of
a generic library?
personally i think now that it is worthless

-- 
Pavel

Re: [boost] performance of a linear algebra/matrix library

DE