From: Walter Bowen
Observation 1) For matrix products of two small (~3-by-3) matrices ( i.e., prod(smallMat1, smallMat2) ) uBLAS executes MUCH FASTER than Math.h++.
Observation 2) For matrix produces of two large matrices (100 by 100) uBLAS and Math.h++ give similar execuation speed.
Obervation 3) (This is the kicker...) for strung-together matrix products like "Mat1 * Mat2 * Mat3 * Mat4 * Mat5", which I am implementing as "prod(Mat1, prod(Mat2, prod( Mat3, prod(Mat4,Mat5) ) ) ) Math.h++ is MUCH MUCH FASTER than uBLAS. -- by ~2 orders OF MAGNITUDE! I believe that this is a characteristic of the scalability of template expressions.
My question is this: Is there a way I an instruct uBLAS not to use template expressions when for strung-together matrix products?
I'm not familiar with uBLAS, but I thought this was one of the _strengths_ of expression templates (if that's what you mean), that it may be used to e.g. unroll loops (especially useful for smaller matrices), and also fuse loops together, avoiding temporary matrices, such as your expression above here, involving several matrices being multiplied together (especially useful for larger matrices, where several matrices are multiplied together).
At least, that's the approach taken in libraries such as Blitz++, where
The problem is that expression templates avoid temporary objects by
evaluating expressions as they are needed.
By nesting too many matrix products, the inner products have to be computed
many times.
----- Original Message -----
From: "Terje Slettebø"
have achieved performance comparable to Fortran, both for small and large matrices. It performs especially good on expressions with several large matrices, as loop-unrolling is not enough for that. You also need to eliminate temporary matrices, and fuse the loops to one set of loops.
Regards,
Terje