
Maurizio Vitale wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
- My concerns about how to do, for example, a 32x32->64-bit multiply efficiently without actually doing a 64x64->64-bit multiply.
I don't see too many ways out. One thing people do is to shift right by 16 both operands, do a 16x16->32 and then shift the result left by 32. (e.g. compute (x/2^16 * y/2^16)*2^32). This maintains at least the magnitude of the result, but whether it is acceptable or not depends on the application domain.
No, I'm not interested in anything with loss of precision.
Btw, X86_64 gives you 64x64->128, I think.
That's interesting. Is there any way to use that from C++? Of course, it also has a relatively efficient 32x32->64 because it has a single instruction for 64x64->64. So any code for this problem would have to be architecture-specific. But here is the sort of thing that I am thinking about for machines that only have 32x32->32: In these figures, each letter represents 16 bits. If I want to do 32x32->64, I need to break up the two operands into two 16-bit portions and do this calculation: A B x C D --------- B*D A*D C*B A*C --------- So that's 4 multiplies. This compares to the naive method where, in C++, I first cast to 64-bits and the compiler generates a 64x64->64 multiply like this: A B C D x E F G H ----------------- D*H C*H G*D B*H C*G F*D A*H B*G F*C E*D ------------------ That uses 10 multiplies. You don't need to do e.g. A*E as it falls off the most significant end of the result. (Have I got that right?) If you are lucky there might be some run-time zero detection that simplifies it a bit (e.g. in gcc this is done in a library function). But even then detecting it at compile time by explicitly asking fora 32x32->64 multiplication would still be better. (Is this sort of integer arithmetic of interest to people, other than as the implementation of fixed point? I believe there was a big-int SoC proposal; does that overlap with this?)
I've seen in your code that you allocate one more bit than requested by the user in order to detect overflow. As you note this is problematic when you hit the machine operand length. In those cases, besides checking the msbs before and after you can also use the carry flag. It can be accessed through inline asm instruction, or used implicitely via conditional branches and conditional moves.
Yes. Storing a 32-bit safe_int in 32 bits is significantly harder than storing a 31-bit one, however you do it. I believe that most of the time when we ask the compiler for an 'int', we don't really care exactly how many bits it has and making a 31-bit value your default safe_int so that it can use the relatively easy guard bit implementation would be a good enough solution. Regards, Phil.