
hi all i work on a linear algebra library with an intent to propose it for inclusion into boost collection of libs so my question is: how fast such a lib should be to make everybody happy? i found that to increase the performance of a generic linear algebra library one must complicate the implementation to a very high degree (though this statement should be obvious to everyone) i struggled with poor performance of matrix multiplication and finally identified the bottleneck i improved the implementation (seriosly complicating it, i'm afraid i will forget how it works in a month) and now it performs some 33% slower than C code for virtually any size of matrix the C code is as follows: typedef double type; const size_t n = 42; type *a = new type[n*n], *a1 = new type[n*n], *a2 = new type[n*n]; //filling 'a1' and 'a2' for (size_t i = 0; i<n; ++i) { type * const a_i = a + i*n; for (size_t j = 0; j<n; ++j) a_i[j] = 0.; for (size_t k = 0; k<n; ++k) { type *a = a_i; const type a1_ik = a1[i*n + k]; const type *a2_k = a2 + k*n; for (size_t j = 0; j<n; ++j, ++a, ++a2_k) *a += a1_ik*(*a2_k); } } the C++ code is just matrix<type> m(n, n), m1(n, n), m2(n, n); //filling 'm1' and 'm2' m.assign(m1, m2); more detailed results: 'n' is the dimensions of matrices, 'ratio' is runtime of C++ code divided by runtime of C code (time_cpp/time_c) n | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 ratio | 1,77 | 1,57 | 1,34 | 1,12 | 1,05 | 1,12 | 1,32 | 1,33 | 1,23 | 1,23 | 1,23 as you can see starting with 8 by 8 matices C++ code is roughly 33% slower than C code is that enough to make you (the reader) happy? -- Pavel