Re: [boost] [units] runtime performance again

I added numeric solution of a simple differential equation to the benchmark.
output from 3 runs:
multiplying ublas::matrix<double>(1000, 1000) : 16.016 seconds multiplying ublas::matrix<quantity>(1000, 1000) : 16.453 seconds tiled_matrix_multiply<double>(1000, 1000) : 1.859 seconds tiled_matrix_multiply<quantity>(1000, 1000) : 1.843 seconds solving y' = 1 - x + 4 * y with double: 3.219 seconds solving y' = 1 - x + 4 * y with quantity: 2.656 seconds
multiplying ublas::matrix<double>(1000, 1000) : 16.281 seconds multiplying ublas::matrix<quantity>(1000, 1000) : 16.406 seconds tiled_matrix_multiply<double>(1000, 1000) : 1.906 seconds tiled_matrix_multiply<quantity>(1000, 1000) : 1.859 seconds solving y' = 1 - x + 4 * y with double: 3.281 seconds solving y' = 1 - x + 4 * y with quantity: 2.61 seconds
multiplying ublas::matrix<double>(1000, 1000) : 16.094 seconds multiplying ublas::matrix<quantity>(1000, 1000) : 16.516 seconds tiled_matrix_multiply<double>(1000, 1000) : 1.875 seconds tiled_matrix_multiply<quantity>(1000, 1000) : 1.859 seconds solving y' = 1 - x + 4 * y with double: 3.203 seconds solving y' = 1 - x + 4 * y with quantity: 2.672 seconds
Interesting - this definitely shows some of the pitfalls of simple performance testing. Here are my results : multiplying ublas::matrix<double>(1000, 1000) : 42.78 seconds multiplying ublas::matrix<quantity>(1000, 1000) : 42.31 seconds tiled_matrix_multiply<double>(1000, 1000) : 1.73 seconds tiled_matrix_multiply<quantity>(1000, 1000) : 2.05 seconds solving y' = 1 - x + 4 * y with double: 2.59 seconds solving y' = 1 - x + 4 * y with quantity: 2.57 seconds multiplying ublas::matrix<double>(1000, 1000) : 42.74 seconds multiplying ublas::matrix<quantity>(1000, 1000) : 42.43 seconds tiled_matrix_multiply<double>(1000, 1000) : 1.77 seconds tiled_matrix_multiply<quantity>(1000, 1000) : 2.08 seconds solving y' = 1 - x + 4 * y with double: 2.59 seconds solving y' = 1 - x + 4 * y with quantity: 2.59 seconds multiplying ublas::matrix<double>(1000, 1000) : 42.78 seconds multiplying ublas::matrix<quantity>(1000, 1000) : 42.31 seconds tiled_matrix_multiply<double>(1000, 1000) : 1.74 seconds tiled_matrix_multiply<quantity>(1000, 1000) : 2.06 seconds solving y' = 1 - x + 4 * y with double: 2.6 seconds solving y' = 1 - x + 4 * y with quantity: 2.57 seconds I'm not sure why my simple matrix multiplication results are so much slower...the others are comparable. In any case, the relative performance is obviously close enough to identical for non-HPC applications... I think the performance arguments look like red herrings... I'll put this in the sandbox example code. Matthias

Interesting - this definitely shows some of the pitfalls of simple performance testing. Here are my results :
...
I'm not sure why my simple matrix multiplication results are so much slower...the others are comparable.
This could be very well a cache size issue. What processors exactly do you compare here? You might want to get out your favourite performance analyzer and have a look on the L2 cache misses. Or play around with the matrix sizes, observe the FLOP rates and try to correlate them to the amount of memory worked on.
In any case, the relative performance is obviously close enough to identical
Well, the relative performance doesn't say much unless the reference is already close enough to the theoretical peak, considering the underlying algorithm, processor, cache sizes, memory bandwidth etc. Yours, Martin.

AMDG Martin Schulz <Martin.Schulz <at> synopsys.com> writes:
I'm not sure why my simple matrix multiplication results are so much slower...the others are comparable.
This could be very well a cache size issue.
That's exactly what's happening. The basic matrix multiplication algorithm used by ublas is very cache unfriendly for large matrices. The ublas version is primarily to make sure that my implementation is correct. In Christ, Steven Watanabe

As has been mentioned earlier in this thread compile-time dimensional analysis is extremely useful. However many common compilers have issues fully optimizing these wrappers away. The correctness verification also happens on every compile and takes a measurable amount of time. Is there a configuration of this library one can use that could ensure drop in compatability with raw floating-point types? I'm thinking you could control dimensional analysis with a preprocessor switch (much like many people do for concept checking now). What would this take? Would disallowing all implicit and explicit conversions be sufficient? Thanks, Michael Marcin

AMDG Michael Marcin <mmarcin <at> method-solutions.com> writes:
Is there a configuration of this library one can use that could ensure drop in compatability with raw floating-point types?
I'm thinking you could control dimensional analysis with a preprocessor switch (much like many people do for concept checking now).
What would this take? Would disallowing all implicit and explicit conversions be sufficient?
Not quite. You would also have to make the static units such as meters have type boost::units::detail::one or some equivalent type so that the construction syntax remains the same. I haven't actually tried it, so that may still not be enough. In Christ, Steven Watanabe
participants (4)
-
Martin Schulz
-
Matthias Schabel
-
Michael Marcin
-
Steven Watanabe