
Michael, While I think there is a place for a fixed size integer library in Boost that's "as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison. So for comparison, I measured how long it took to compute 100 dot products of 1000-element vectors. The vectors were filled with values that "used" a specified number of bits (to simulate the typical use case were most values are quite small, and just a few actually require an extended precision type), here's the results compiled on Win32 with VC10 and /Ox: Multiprecision mp_int128: 32 bit time = 0.0100556 seconds 64 bit time = 0.02594 seconds 96 bit time = 0.0229292 seconds 128 bit time = 0.0142151 seconds Multiprecision mp_int256: 32 bit time = 0.00376926 seconds 64 bit time = 0.0102474 seconds 96 bit time = 0.01383 seconds 128 bit time = 0.0192363 seconds large_int<unsigned long long, long long>: 32 bit time = 0.0498975 seconds 64 bit time = 0.111413 seconds 96 bit time = 0.183727 seconds 128 bit time = 0.251469 seconds large_int<large_int<unsigned long long, unsigned long long>, large_int<unsigned long long, long long> >: 32 bit time = 1.08451 seconds 64 bit time = 2.43866 seconds 96 bit time = 3.79658 seconds 128 bit time = 5.20211 seconds Note how the times taken for mp_intXXX grow largely based on bits actually used, not bits specified in the type. But also, even the case where I expected large_int to do well - in fact where it really should win in this kind of comparison - for part filled 128-bit int's, it's still over 4 times slower than the multiprecision code :-( My guess is this is a consequence of the implementation method (as a doubled-type rather than an array of limbs)? If so that's actually quite useful information to know. Regards, John PS I see you're using bit by bit multiplication - easy to understand and code, but it's glacially slow in general, I tried this for integer division early on and it ran like treacle (on a very cold day)! PPS code follows: #include <boost/chrono.hpp> #include <boost/multiprecision/cpp_int.hpp> #include <boost/large_int.hpp> template <class Clock> struct stopwatch { typedef typename Clock::duration duration; stopwatch() { m_start = Clock::now(); } duration elapsed() { return Clock::now() - m_start; } void reset() { m_start = Clock::now(); } private: typename Clock::time_point m_start; }; template <class T> void time() { std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0); boost::chrono::duration<double> time; stopwatch<boost::chrono::high_resolution_clock> c; for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "32 bit time = " << time << std::endl; for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "64 bit time = " << time << std::endl; for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "96 bit time = " << time << std::endl; for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "128 bit time = " << time << std::endl; } int _tmain(int argc, _TCHAR* argv[]) { using namespace boost::multiprecision; time<mp_int128_t>(); time<mp_int256_t>(); typedef boost::large_int::large_int<unsigned long long, long long> t128; typedef boost::large_int::large_int<unsigned long long, unsigned long long> ut128; typedef boost::large_int::large_int<ut128, t128> t256; time<t128>(); time<t256>(); return 0; }