uBLAS Observation: terrible execution speed for prod(matrix,prod(matrix,prod(matrix,matrix)))
Hi, I am new to uBLAS. I am looking to see if it will be useful for my application (I develop 6-Degrees-of-freedom flight simulations used to analyze avionics systems.) I have benchmarked a few lines of code with RogueWave's Math.h++ (testing for execution speed) and made the following observation. Observation 1) For matrix products of two small (~3-by-3) matrices ( i.e., prod(smallMat1, smallMat2) ) uBLAS executes MUCH FASTER than Math.h++. Observation 2) For matrix produces of two large matrices (100 by 100) uBLAS and Math.h++ give similar execuation speed. Obervation 3) (This is the kicker...) for strung-together matrix products like "Mat1 * Mat2 * Mat3 * Mat4 * Mat5", which I am implementing as "prod(Mat1, prod(Mat2, prod( Mat3, prod(Mat4,Mat5) ) ) ) Math.h++ is MUCH MUCH FASTER than uBLAS. -- by ~2 orders OF MAGNITUDE! I believe that this is a characteristic of the scalability of template expressions. My question is this: Is there a way I an instruct uBLAS not to use template expressions when for strung-together matrix products? Regards, Walt
From: Walter Bowen
Observation 1) For matrix products of two small (~3-by-3) matrices ( i.e., prod(smallMat1, smallMat2) ) uBLAS executes MUCH FASTER than Math.h++.
Observation 2) For matrix produces of two large matrices (100 by 100) uBLAS and Math.h++ give similar execuation speed.
Obervation 3) (This is the kicker...) for strung-together matrix products like "Mat1 * Mat2 * Mat3 * Mat4 * Mat5", which I am implementing as "prod(Mat1, prod(Mat2, prod( Mat3, prod(Mat4,Mat5) ) ) ) Math.h++ is MUCH MUCH FASTER than uBLAS. -- by ~2 orders OF MAGNITUDE! I believe that this is a characteristic of the scalability of template expressions.
My question is this: Is there a way I an instruct uBLAS not to use template expressions when for strung-together matrix products?
I'm not familiar with uBLAS, but I thought this was one of the _strengths_ of expression templates (if that's what you mean), that it may be used to e.g. unroll loops (especially useful for smaller matrices), and also fuse loops together, avoiding temporary matrices, such as your expression above here, involving several matrices being multiplied together (especially useful for larger matrices, where several matrices are multiplied together). At least, that's the approach taken in libraries such as Blitz++, where they have achieved performance comparable to Fortran, both for small and large matrices. It performs especially good on expressions with several large matrices, as loop-unrolling is not enough for that. You also need to eliminate temporary matrices, and fuse the loops to one set of loops. Regards, Terje
From: Walter Bowen
Observation 1) For matrix products of two small (~3-by-3) matrices ( i.e., prod(smallMat1, smallMat2) ) uBLAS executes MUCH FASTER than Math.h++.
Observation 2) For matrix produces of two large matrices (100 by 100) uBLAS and Math.h++ give similar execuation speed.
Obervation 3) (This is the kicker...) for strung-together matrix products like "Mat1 * Mat2 * Mat3 * Mat4 * Mat5", which I am implementing as "prod(Mat1, prod(Mat2, prod( Mat3, prod(Mat4,Mat5) ) ) ) Math.h++ is MUCH MUCH FASTER than uBLAS. -- by ~2 orders OF MAGNITUDE! I believe that this is a characteristic of the scalability of template expressions.
My question is this: Is there a way I an instruct uBLAS not to use template expressions when for strung-together matrix products?
I'm not familiar with uBLAS, but I thought this was one of the _strengths_ of expression templates (if that's what you mean), that it may be used to e.g. unroll loops (especially useful for smaller matrices), and also fuse loops together, avoiding temporary matrices, such as your expression above here, involving several matrices being multiplied together (especially useful for larger matrices, where several matrices are multiplied together).
At least, that's the approach taken in libraries such as Blitz++, where
The problem is that expression templates avoid temporary objects by
evaluating expressions as they are needed.
By nesting too many matrix products, the inner products have to be computed
many times.
----- Original Message -----
From: "Terje Slettebø"
have achieved performance comparable to Fortran, both for small and large matrices. It performs especially good on expressions with several large matrices, as loop-unrolling is not enough for that. You also need to eliminate temporary matrices, and fuse the loops to one set of loops.
Regards,
Terje
--- In Boost-Users@y..., Walter Bowen
--- In Boost-Users@y..., Walter Bowen
Hi, I am new to uBLAS. I am looking to see if it will be useful for my application (I develop 6-Degrees-of-freedom flight simulations used to analyze avionics systems.) I have benchmarked a few lines of code with RogueWave's Math.h++ (testing for execution speed) and made the following observation.
Observation 1) For matrix products of two small (~3-by-3) matrices ( i.e., prod(smallMat1, smallMat2) ) uBLAS executes MUCH FASTER than Math.h++.
Observation 2) For matrix produces of two large matrices (100 by
Fine. 100)
uBLAS and Math.h++ give similar execuation speed.
As Toon already pointed out, you could try 1000 by 1000 matrices to check, if math.h++ uses blocked operations internally (which uBLAS doesn't).
Obervation 3) (This is the kicker...) for strung-together matrix products like "Mat1 * Mat2 * Mat3 * Mat4 * Mat5", which I am implementing as "prod(Mat1, prod(Mat2, prod( Mat3, prod(Mat4,Mat5) ) ) ) Math.h++ is MUCH MUCH FASTER than uBLAS. -- by ~2 orders OF MAGNITUDE! I believe that this is a characteristic of the scalability of template expressions.
Correct. This observation was first stated by Benedikt Weber during uBLAS review: http://lists.boost.org/MailArchives/boost/msg31283.php
My question is this: Is there a way I an instruct uBLAS not to use template expressions when for strung-together matrix products?
You could reintroduce temporaries directly
prod(Mat1,
matrix
participants (5)
-
jhrwalter
-
Matthias Kronenberger
-
Terje Slettebø
-
tknapen
-
Walter Bowen