
From your benchmark I see that you didn't provide results for boost::multiprecision::fixed_int, but rather for boost::multiprecision::** backends::cpp_int_backend. I tried to update boost multiprecision branch, but boost/mutliprecision/backends directory appeared to be empty.
Many thanks for the use case - in point of fact it's proved pretty hard to
get a general purpose solution anywhere near your special-purpose code - simply because your code is so simple
I wouldn't say so. The only simple thing about my extended_int implementation is that it implements 3 basic operations: addition, subtraction and multiplication. But I don't see any problems on extending it to support division and other arithmetic operations. My point is that for fixed integer class it's not a good idea to represent negative numbers as two's complement. The main point of my benchmark was to compare fixed int implementations. And those should not have much performance difference in construction of temporary variables.
Ah, you probably misunderstood what I meant. Let me give you a concrete example: Your multiplication code is fast, probably as fast as it can be in your particular use case, the limb counts never grow too large, so there are actually very few integer ops executed for a multiply. Now it so happens that multiplying by an integer (or big number with a single limb) can be optimised with it's own simpler routine, that's *much* quicker (as in order of magnitude quicker) for larger limb counts. But.... hooking that code into cpp_int made your benchmark run slightly slower... the cause is the extra check in there to see if the optimisation can be taken! Basically, in most cases, your routines are called with small numbers that execute so few integer ops, that seemingly trivial changes can have quite detrimental effects on runtime. There are other similar examples, but basically my case is that to make the integer type general purpose there are actually lots of these special case checks, each of which are there for a very good reason, but which collectively easily account for that 10% slowdown in my code compared to yours. In short there's no free lunch, what's an optimisation for X is often a dis-optimisation for Y.
I think what we are seeing in Andrii's use case may generalize across
quite a few different big number use cases.
Exactly, the main point of usage of fixed big integer class is that it doesn't require you to think about temporaries creation (except overflowing stack, but that's another topic). I would say that for math computations it is the best option in 90% of cases.
It would be good for Boost to have both big int implementations: with stack allocated backend and heap allocated backend. I was able to contact Arseniy Kapoulkine, who's big int implementation is under /sandbox/SOC/2007/bigint. Arseniy agreed to work with Boost group to solve any problems that prevent the standardization. I will try to come up with benchmarks of the current implementation based on the Voronoi code. It would be very nice to combine that implementation with boost multiprecision.
Ah, I should have been clearer what I've done here: the old 2's complement fixed_int class has gone. There is now "one true" C++ big-int implementation that handles both arbitrary precision and fixed precision cases using signed-magnitude representation. Again, another compromise: sign magnitude is actually very slightly slower than the old 2's complement code for some operations, but has enough other advantages to make it worthwhile (so you sold me on that). It also has a template parameter that lets you tune the size of the internal cache (the bit that's used before any memory allocation takes place). Again this is a compromise: testing with my Miller Rabin code (another very important use case), increasing the internal cache size could actually make things worse if you happened to choose an "unfortunate" value compared to the size of the int's being tested. However, increasing some more so that no memory allocation ever happens then drops runtime down dramatically. Basically what I'm saying here, is there is no one true configuration, that offers the very best performance in all cases, if there was, we'd probably all be using GMP ;-) John. PS, the next step is to start stealing some more special case code from the SOC project.... I fully expect that to make your benchmarks run very slightly more slowly, but is necessary for other use cases....