
for those who still care i investigated an "issue" which i claimed was due to abstraction penalty it seemed to me the error was in the source code... indeed it was in the code... of my DNA it turned out the performance differnece of 33% is due to loop unrolling in the C code while the loop in the C++ code was not unrolled in other aspects the two loops were identical (abstraction in the C++ code completely optimized away) so it's not the C++ code that run slow but the optimized C code that run faster i unrolled the loop in the C++ code manually and the two started to run in the same time (it seems now it was a cpu pipeline issue) furthermore i looked at icc11 generetaed assembly code and was shocked icc not only optimized the abstraction away but also unrolled both loops (the C and C++ ones) AND vectorized them that is both plain C and C++ pieces of code were transformed into instruction sequences like movsd movhpd mulpd movapd //etc. as a result icc generated code ran 15% faster than both msvc80 and msvc10 verions here a question arises: since a compiler is able to generate very fast code involving simd instructions is one supposed to provide simd-enabled implementation of a generic library? personally i think now that it is worthless -- Pavel