[boost] [xint] Performance of fixed-size large integers

3 Mar 2011

      Dear All,

I've never needed to use arbitrary-precision integers, but I have 
sometimes wanted large fixed-size integers like uint128_t, and so I've 
had a quick look at that aspect of the proposed Xint.

The docs' front page has a link under "Detailed Usage Information" to 
"Fixed-Length Integers" - which is not really very detailed, and has 
examples in the notes like "integer_t<8>" that don't work.  I 
established quickly enough that the correct syntax is 
integer_t<fixedlength<8>>.  Also I noted that the conventions here 
don't match the built-in (u)intN_t types, i.e. int64_t vs. int63_t and 
the "wrap-around" behaviour.  Perhaps there is some rationale for this, 
but I would be happier to see it consistent with the built-in types.  
(It might also be appropriate to think about how this could work with 
the existing Boost.Integer.)

I have done a quick test of the performance of a 96-bit type as follows:

   typedef boost::xint::integer_t< boost::xint::options::fixedlength<96>
...
uint96_t;
uint96_t a = 0;
   uint96_t b = 0;
   for (int i=1; i<loops; ++i) {
     a = a + a + (a & b) + i;
     b = b + i;
   }

This is an entirely contrived example, but the point is to minimise the 
work that I have to do to create something to compare it with; by using 
only operator+ and operator&, I can quickly hack together this:

struct uint96_t {
   uint32_t top;
   uint32_t mid;
   uint32_t bottom;

   uint96_t() {}
   uint96_t(uint32_t val): top(0), mid(0), bottom(val) {}
};

uint96_t operator&(uint96_t a, uint96_t b)
{
   uint96_t r;
   r.top = a.top & b.top;
   r.mid = a.mid & b.mid;
   r.bottom = a.bottom & b.bottom;
   return r;
}

uint96_t operator+(uint96_t a, uint96_t b)
{
   uint96_t r;
   __asm__(
     "adds   %0, %3, %6\n"  // Add and set carry flag
     "adcs   %1, %4, %7\n"  // Add with carry-in and set carry flag
     "adc    %2, %5, %8\n"  // Add with carry-in.
   : "=r"  (r.bottom),  // %0
     "=r"  (r.mid),     // %1
     "=r"  (r.top)      // %2
   : "r"   (a.bottom),  // %3
     "r"   (a.mid),     // %4
     "r"   (a.top),     // %5
     "r"   (b.bottom),  // %6
     "r"   (b.mid),     // %7
     "r"   (b.top)      // %8
   : "cc"
   );
   return r;
}

It seems to me that that is several hundred times faster than the Xint 
code (on my 1.2 GHz ARM test system).  Yes, it requires assembler in 
order to do multi-word addition with carry, but we can write a less 
efficient portable version of that:

uint96_t operator+(uint96_t a, uint96_t b)
{
   uint96_t r;
   r.bottom = a.bottom + b.bottom;
   int carry0 = r.bottom < a.bottom ? 1 : 0;
   r.mid = a.mid + b.mid + carry0;
   int carry1 = r.mid < a.mid ? 1 : 0;  // hmm, not totally sure about that...
   r.top = a.top + b.top + carry1;
   return r;
}

This is about half the speed of the assembler, but still hundreds of 
times faster than the Xint code.

Based on that, I think the claim that the library is "fast" on its 
front page needs substantiating.  I encourage anyone with experience of 
arbitrary-precision variable-length numbers to run some quick 
benchmarks.  (Or, maybe I'm doing something horribly wrong here.  I 
haven't even checked that I get the right answer.)

Other reviews have reported that in this fixed-length case the library 
still stores some overhead in addition to the actual data.  That's 
unfortunate as I would like to be able to, for example, store values in 
a memory-mapped file, or to use them in e.g. Beman's proposed b-tree library.

I don't want to write a review that says "This library doesn't do what 
I want, therefore it should be rejected".  No doubt others will find 
applications where it will be useful.  But I think the performance when 
used for fixed-size integers should be investigated.

Regards,  Phil.

[boost] [xint] Performance of fixed-size large integers

Phil Endecott