Re: [boost] [Multiprecision] Benchmarking

15 Jun 2012

      ...
...
For sure - expression templates are about eliminating temporaries - if
the temporaries are cheap (cpp_int containing small values and no
dynamic memory allocation) then there's no benefit and probably a hit
from bringing in all that machinary.  That's why expression templates
are disabled for the fixed size cpp_int typedefs.
I'd like to propose an alternate viewpoint: in the context of numerics,
expression templates are about delaying evaluation to gather more
context, so you can perform the computation more efficiently.  From that
point-of-view there's absolutely no reason that they should ever make
code slower.  Just generate the appropriate code for the types and
expressions involved.
Sigh... I hear what you're saying but it's not that simple.

Consider cpp_int: it uses something akin to the "small string optimization" 
for small numbers, and then only switches to dynamic allocation for larger 
values.

Now consider a simple expression:

a = b * c * d;

The optimal evaluation strategy depends upon the size of the values in b, c 
and d: for small values that don't require memory allocation, temporaries, 
and copying is cheap, so it might be best to do:

t = b * c;
a = t * d;

But for larger values, the cost of memory allocation and copying starts to 
dominate so:

a = b * c;
a *= d;

Might be best  (though it depends a bit on how multiplication is 
implemented).

For larger values still, the cost of the multiplication is so expensive, 
that all methods are the same.

So you can't choose between these at compile time.  You could clearly add 
some backend-specific runtime logic to choose between them, but that hits 
another issue:

In the benchmark in question, we're looking at arithmetic on small values 
(mostly integer sized) that just happen to be stored in a cpp_int.  It's an 
important use case, and cpp_int is many many times more efficient than gmp 
thanks to not having to allocate memory for small values, but it's slower 
than "naive" simple code.  The problem is that the cost of performing the 
arithmetic is as cheap as it could possibly be algorithmically speaking, so 
for cpp_int, the cost is completely dominated by other runtime logic used to 
select the best method.  Let me give you two concrete examples so you can 
tell me how to avoid the them :-)

1) When multiplying in the cpp_int backend, if one value is a single limb 
long then we can optimize the multiplication routine by dispatching to the 
special "multiply by an int" code.  When the other value is medium sized, 
it's a big win (2 or 3 times improvement), but for the trivial case that 
often occurs in computational geometry, the cost of the extra if statement 
and function call can easily add a 20% hit or more.  One thing I could 
experiment with is overloads for the multiplication/addition of small fixed 
sized integer types - that might help mp_int128_t, but not variable sized 
types like cpp_int.
2) In the expression template unpacking and evaluation, one crucial piece of 
information is "does the LHS also appear on the RHS".  Sometime if it does 
we can optimize, for example "a = a + b" gets transformed into "a += b", 
other times if the LHS does not appear in the RHS we can trash the LHS value 
and use it as working space, saving a temporary.  But... it's a runtime 
calculation to figure that out, it messes up inlining as well as taking a 
small amount of time.  For non-trivial values which require memory 
allocation etc, it makes not one jot of difference, but in the case in 
question, it completely dominates the runtime compared to what is in effect, 
just an int*int or whatever.  Just curious, but can constexp partially solve 
this at all?  We would need a way to get the address of the variables in the 
expression into the template arg list so that questions like this could be 
asked at compile time.  Even then I believe that getting the cost of 
expression template unpacking down to zero is likely to be a fools errand.

John.

Re: [boost] [Multiprecision] Benchmarking

John Maddock