
For sure - expression templates are about eliminating temporaries - if the temporaries are cheap (cpp_int containing small values and no dynamic memory allocation) then there's no benefit and probably a hit from bringing in all that machinary. That's why expression templates are disabled for the fixed size cpp_int typedefs.
I'd like to propose an alternate viewpoint: in the context of numerics, expression templates are about delaying evaluation to gather more context, so you can perform the computation more efficiently. From that point-of-view there's absolutely no reason that they should ever make code slower. Just generate the appropriate code for the types and expressions involved.
Sigh... I hear what you're saying but it's not that simple. Consider cpp_int: it uses something akin to the "small string optimization" for small numbers, and then only switches to dynamic allocation for larger values. Now consider a simple expression: a = b * c * d; The optimal evaluation strategy depends upon the size of the values in b, c and d: for small values that don't require memory allocation, temporaries, and copying is cheap, so it might be best to do: t = b * c; a = t * d; But for larger values, the cost of memory allocation and copying starts to dominate so: a = b * c; a *= d; Might be best (though it depends a bit on how multiplication is implemented). For larger values still, the cost of the multiplication is so expensive, that all methods are the same. So you can't choose between these at compile time. You could clearly add some backend-specific runtime logic to choose between them, but that hits another issue: In the benchmark in question, we're looking at arithmetic on small values (mostly integer sized) that just happen to be stored in a cpp_int. It's an important use case, and cpp_int is many many times more efficient than gmp thanks to not having to allocate memory for small values, but it's slower than "naive" simple code. The problem is that the cost of performing the arithmetic is as cheap as it could possibly be algorithmically speaking, so for cpp_int, the cost is completely dominated by other runtime logic used to select the best method. Let me give you two concrete examples so you can tell me how to avoid the them :-) 1) When multiplying in the cpp_int backend, if one value is a single limb long then we can optimize the multiplication routine by dispatching to the special "multiply by an int" code. When the other value is medium sized, it's a big win (2 or 3 times improvement), but for the trivial case that often occurs in computational geometry, the cost of the extra if statement and function call can easily add a 20% hit or more. One thing I could experiment with is overloads for the multiplication/addition of small fixed sized integer types - that might help mp_int128_t, but not variable sized types like cpp_int. 2) In the expression template unpacking and evaluation, one crucial piece of information is "does the LHS also appear on the RHS". Sometime if it does we can optimize, for example "a = a + b" gets transformed into "a += b", other times if the LHS does not appear in the RHS we can trash the LHS value and use it as working space, saving a temporary. But... it's a runtime calculation to figure that out, it messes up inlining as well as taking a small amount of time. For non-trivial values which require memory allocation etc, it makes not one jot of difference, but in the case in question, it completely dominates the runtime compared to what is in effect, just an int*int or whatever. Just curious, but can constexp partially solve this at all? We would need a way to get the address of the variables in the expression into the template arg list so that questions like this could be asked at compile time. Even then I believe that getting the cost of expression template unpacking down to zero is likely to be a fools errand. John.