
While I think there is a place for a fixed size integer library in Boost that's "as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison.
I agree. We are in the process of reviewing the proposed Boost.Multiprecision right now. The review has been overwhelmingly non-controversial. One of the main points that did arise, however, was the performance and seamless interoperability of near-metal signed and unsigned integers with, say, 128, 256, 512 bits. There is widespread community interest in integer types of these sizes. That being said, though, performance is crucial for the users in this digit range. See below.
So for comparison, I measured how long it took to compute 100 dot products of 1000-element vectors. The vectors were filled with values that "used" a specified number of bits (to simulate the typical use case were most values are quite small, and just a few actually require an extended precision type), here's the results compiled on Win32 with VC10 and /Ox:
Multiprecision mp_int128: 32 bit time = 0.0100556 seconds 64 bit time = 0.02594 seconds 96 bit time = 0.0229292 seconds 128 bit time = 0.0142151 seconds Multiprecision mp_int256: 32 bit time = 0.00376926 seconds 64 bit time = 0.0102474 seconds 96 bit time = 0.01383 seconds 128 bit time = 0.0192363 seconds
large_int<unsigned long long, long long>: 32 bit time = 0.0498975 seconds 64 bit time = 0.111413 seconds 96 bit time = 0.183727 seconds 128 bit time = 0.251469 seconds large_int<large_int<unsigned long long, unsigned long long>, large_int<unsigned long long, long long> >: 32 bit time = 1.08451 seconds 64 bit time = 2.43866 seconds 96 bit time = 3.79658 seconds 128 bit time = 5.20211 seconds
Looking at your code, it is exceptionally terse and clean. The implementation is clear, intuitive and easy to use. In my opinion, the code is excellent! As John has indicated, though, your code really slows down with the bit-by-bit multiplication and division algorithms. As a rule of thumb, big number libraries spend an estimated 80% of their time in the inner loops of the multiplication algorithm---and that's for highly optimized ones. The problem with big number libraries is that you can please, maybe half the people, but merely half of he time. (Is that pleasing only a quarter of us?) You have a clean and terse implementation that could potentially fill a niche between the potential Boost.Multiprecision and bare-metal built-in types. That's two times *potential* in one sentence. In my opinion, to really fill this niche, however, you need high performance for 128, 256, 512 and 1024 bits. What we really need to do in the community is see how we will address the needs for big integers, rationals and floats. How will we transition from bare-metal built-in types through medium sizes like 128, 256, 512 bits, and on to thousands of bits? To what extent do we want to emphasize clarity of implementation? When should we compromise on brevity of code and get down and dirty with limbs, add-and-carry, even (oh no!= assembly hooks, etc. to squeeze performance? I believe that the proposed Boost.Multiprecision has made a good compromise with big integers, rationals and floats by providing a working and relatively efficient solution. John did a lot more work than I did. But the work on big numbers in boost and C++ is far from done! Perhaps we need to work on your code in order to get the performance that users will demand for near-metal types. Perhaps this could potentially plug into the potentially proposed Boost.Multiprecision. Thanks for your work! Best regards, Chris.