Re: [boost] Some follow-up on my fixed point code, and safe integers

27 Apr 2007

      Maurizio Vitale wrote:
...
"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
...
- My concerns about how to do, for example, a 32x32->64-bit multiply 
efficiently without actually doing a 64x64->64-bit multiply.
I don't see too many ways out. One thing people do is to shift right by 16 both operands, do a 
16x16->32 and then shift the result left by 32. (e.g. compute (x/2^16 * y/2^16)*2^32). This 
maintains at least the magnitude of the result, but whether it is acceptable or not depends on 
the application domain.
No, I'm not interested in anything with loss of precision.
...
Btw, X86_64 gives you 64x64->128, I think.
That's interesting.  Is there any way to use that from C++?  Of course, 
it also has a relatively efficient 32x32->64 because it has a single 
instruction for 64x64->64.  So any code for this problem would have to 
be architecture-specific.  But here is the sort of thing that I am 
thinking about for machines that only have 32x32->32:

In these figures, each letter represents 16 bits.

If I want to do 32x32->64, I need to break up the two operands into two 
16-bit portions and do this calculation:

     A B
  x  C D
---------
     B*D
   A*D
   C*B
A*C
---------

So that's 4 multiplies.  This compares to the naive method where, in 
C++, I first cast to 64-bits and the compiler generates a 64x64->64 
multiply like this:

           A B C D
       x   E F G H
-----------------
               D*H
             C*H
             G*D
           B*H
           C*G
           F*D
         A*H
         B*G
         F*C
         E*D
------------------

That uses 10 multiplies.  You don't need to do e.g. A*E as it falls off 
the most significant end of the result.  (Have I got that right?)  If 
you are lucky there might be some run-time zero detection that 
simplifies it a bit (e.g. in gcc this is done in a library function).  
But even then detecting it at compile time by explicitly asking fora 
32x32->64 multiplication would still be better.

(Is this sort of integer arithmetic of interest to people, other than 
as the implementation of fixed point?  I believe there was a big-int 
SoC proposal; does that overlap with this?)
...
I've seen in your code that you allocate one more bit than requested by the user in order to detect 
overflow. As you note this is problematic when you hit the machine operand length. In those cases, 
besides checking the msbs before and after you can also use the carry flag. It can be accessed through 
inline asm instruction, or used implicitely via conditional branches and conditional moves.
Yes.  Storing a 32-bit safe_int in 32 bits is significantly harder than 
storing a 31-bit one, however you do it.  I believe that most of the 
time when we ask the compiler for an 'int', we don't really care 
exactly how many bits it has and making a 31-bit value your default 
safe_int so that it can use the relatively easy guard bit 
implementation would be a good enough solution.

Regards,

Phil.