Re: [boost] Large Integer Library

30 Jun 2012

      Michael,

While I think there is a place for a fixed size integer library in Boost 
that's "as simple as possible" (on the grounds that for smaller fixed sizes, 
simple is often faster than more complex schemes, even if the complex 
schemes have better big-O performance), this method of implementation does 
appear to be quite slow, though I confess I've only done a cursory 
comparison.

So for comparison, I measured how long it took to compute 100 dot products 
of 1000-element vectors.  The vectors were filled with values that "used" a 
specified number of bits (to simulate the typical use case were most values 
are quite small, and just a few actually require an extended precision 
type), here's the results compiled on Win32 with VC10 and /Ox:

Multiprecision mp_int128:
32 bit time = 0.0100556 seconds
64 bit time = 0.02594 seconds
96 bit time = 0.0229292 seconds
128 bit time = 0.0142151 seconds
Multiprecision mp_int256:
32 bit time = 0.00376926 seconds
64 bit time = 0.0102474 seconds
96 bit time = 0.01383 seconds
128 bit time = 0.0192363 seconds

large_int<unsigned long long, long long>:
32 bit time = 0.0498975 seconds
64 bit time = 0.111413 seconds
96 bit time = 0.183727 seconds
128 bit time = 0.251469 seconds
large_int<large_int<unsigned long long, unsigned long long>, 
large_int<unsigned long long, long long> >:
32 bit time = 1.08451 seconds
64 bit time = 2.43866 seconds
96 bit time = 3.79658 seconds
128 bit time = 5.20211 seconds

Note how the times taken for mp_intXXX grow largely based on bits actually 
used, not bits specified in the type.  But also, even the case where I 
expected large_int to do well - in fact where it really should win in this 
kind of comparison - for part filled 128-bit int's, it's still over 4 times 
slower than the multiprecision code :-(

My guess is this is a consequence of the implementation method (as a 
doubled-type rather than an array of limbs)?   If so that's actually quite 
useful information to know.

Regards, John

PS I see you're using bit by bit multiplication - easy to understand and 
code, but it's glacially slow in general, I tried this for integer division 
early on and it ran like treacle (on a very cold day)!

PPS code follows:

#include <boost/chrono.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/large_int.hpp>

template <class Clock>
struct stopwatch
{
   typedef typename Clock::duration duration;
   stopwatch()
   {
      m_start = Clock::now();
   }
   duration elapsed()
   {
      return Clock::now() - m_start;
   }
   void reset()
   {
      m_start = Clock::now();
   }

private:
   typename Clock::time_point m_start;
};

template <class T>
void time()
{
   std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0);
   boost::chrono::duration<double> time;
   stopwatch<boost::chrono::high_resolution_clock> c;
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "32 bit time = " << time << std::endl;

   for(unsigned i = 0; i < v1.size(); ++i)
   {
      v1[i] <<= 32;
      v1[i] += 0x12345FF;
      v2[i] <<= 32;
      v2[i] += 0x12345F9;
   }
   c.reset();
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "64 bit time = " << time << std::endl;

   for(unsigned i = 0; i < v1.size(); ++i)
   {
      v1[i] <<= 32;
      v1[i] += 0x12345FF;
      v2[i] <<= 32;
      v2[i] += 0x12345F9;
   }
   c.reset();
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "96 bit time = " << time << std::endl;

   for(unsigned i = 0; i < v1.size(); ++i)
   {
      v1[i] <<= 32;
      v1[i] += 0x12345FF;
      v2[i] <<= 32;
      v2[i] += 0x12345F9;
   }
   c.reset();
   for(unsigned i = 0; i < 100; ++i)
   {
      for(unsigned i = 0; i < v1.size(); ++i)
      {
         v3[i] = v1[i] * v2[i];
      }
   }
   time = c.elapsed();
   std::cout << "128 bit time = " << time << std::endl;

}

int _tmain(int argc, _TCHAR* argv[])
{
   using namespace boost::multiprecision;

   time<mp_int128_t>();
   time<mp_int256_t>();

   typedef boost::large_int::large_int<unsigned long long, long long> t128;
   typedef boost::large_int::large_int<unsigned long long, unsigned long 
long> ut128;
   typedef boost::large_int::large_int<ut128, t128> t256;

   time<t128>();
   time<t256>();

   return 0;
}

Re: [boost] Large Integer Library

John Maddock