Suggest: detail functions for full built-in arithmetic

Maybe we should add a header to our detail (and/or pending) folder encapsulating the full versions of built-in integer arithmetic operations, that most(?) processors have, but neither C nor C++ let us access. I'm talking about functions like: std::pair<unsigned, bool> full_add( unsigned a1, unsigned a2 ); The processor has a "carry" flag whenever an integer addition overflows, but we can't access it (without assembly-language hacks). A conforming C++ workaround is to calculate the carry flag in advance before adding. But we could provide an implementation of this function that uses the assembly-language hacks to compute the sum and carry at once, encapsulating it for any other library (present or future) that needs full addition arithmetic. Other examples would be: std::pair<unsigned, bool> full_subtract( unsigned m, unsigned s ); // for quick access to the "borrow" state boost::array<unsigned, 2> full_multiply(unsigned f1, unsigned f2); // no need to do piecemeal multiply with 16-bit halves // result[1]: high part, result[0]: low part std::pair<boost::array<unsigned, 2>, unsigned> full_divide( unsigned (÷nd)[2], unsigned divisor ); // no need to do piecemeal division one-bit at a time // result.first[1]: quotient, high part // result.first[0]: quotient, low part // result.second: remainder I list only "unsigned int" here, but we could do versions for all the built-in integer types. The assembly-language implementations would require #segregating by processor/compiler type. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Le samedi 08 septembre 2007 à 23:10 -0400, Daryle Walker a écrit :
Maybe we should add a header to our detail (and/or pending) folder encapsulating the full versions of built-in integer arithmetic operations, that most(?) processors have, but neither C nor C++ let us access. I'm talking about functions like:
std::pair<unsigned, bool> full_add( unsigned a1, unsigned a2 );
The processor has a "carry" flag whenever an integer addition overflows, but we can't access it (without assembly-language hacks).
Is it really worth it? Nowadays, compilers are smart enough. They generate better code when details are not obfuscated by hand-crafted assembly code. I tested the following code with GCC: #include <algorithm> std::pair<unsigned, bool> full_add(unsigned a, unsigned b) { return std::make_pair(a + b, a + b < a); } bool no_overflow(unsigned a, unsigned b) { return !full_add(a, b).second; } The no_overflow function is a bit dumb, but it is here to check that the compiler was able to perform optimizations. This is the generated code on x86-64: _Z11no_overflowjj: addl %edi, %esi setae %al ret This is the optimal assembly code. And there is no way to obtain this code if you use an assembly hack in the full_add function. Best regards, Guillaume

Guillaume Melquiond wrote:
Le samedi 08 septembre 2007 ? 23:10 -0400, Daryle Walker a ?crit :
Maybe we should add a header to our detail (and/or pending) folder encapsulating the full versions of built-in integer arithmetic operations, that most(?) processors have, but neither C nor C++ let us access. I'm talking about functions like:
std::pair<unsigned, bool> full_add( unsigned a1, unsigned a2 );
The processor has a "carry" flag whenever an integer addition overflows, but we can't access it (without assembly-language hacks).
Is it really worth it? Nowadays, compilers are smart enough. They generate better code when details are not obfuscated by hand-crafted assembly code. I tested the following code with GCC:
#include <algorithm>
std::pair<unsigned, bool> full_add(unsigned a, unsigned b) { return std::make_pair(a + b, a + b < a); }
bool no_overflow(unsigned a, unsigned b) { return !full_add(a, b).second; }
Here's what I get on ARM: _Z8full_addjj: adds r2, r1, r2 movcc r1, #0 movcs r1, #1 str r2, [r0, #0] strb r1, [r0, #4] mov pc, lr _Z11no_overflowjj: cmn r0, r1 movcs r0, #0 movcc r0, #1 mov pc, lr which looks pretty good, and probably optimal. But I have encountered cases where compilers don't (can't?) do a very good job; for example, I don't think I've seen a way to get optimal code for 128-bit addition. Even 64-bit arithmetic is not so great on 32-bit processors, with library functions used rather than inline adds/adc sequences. But maybe this is better fixed by improving the compiler. Daryle, what application did you have in mind? Phil.

Phil Endecott wrote:
Guillaume Melquiond wrote:
Le samedi 08 septembre 2007 ? 23:10 -0400, Daryle Walker a ?crit :
Maybe we should add a header to our detail (and/or pending) folder encapsulating the full versions of built-in integer arithmetic operations, that most(?) processors have, but neither C nor C++ let us access. I'm talking about functions like:
std::pair<unsigned, bool> full_add( unsigned a1, unsigned a2 );
The processor has a "carry" flag whenever an integer addition overflows, but we can't access it (without assembly-language hacks). Is it really worth it? Nowadays, compilers are smart enough. They generate better code when details are not obfuscated by hand-crafted assembly code. I tested the following code with GCC:
#include <algorithm>
std::pair<unsigned, bool> full_add(unsigned a, unsigned b) { return std::make_pair(a + b, a + b < a); }
bool no_overflow(unsigned a, unsigned b) { return !full_add(a, b).second; }
Here's what I get on ARM:
_Z8full_addjj: adds r2, r1, r2 movcc r1, #0 movcs r1, #1 str r2, [r0, #0] strb r1, [r0, #4] mov pc, lr
_Z11no_overflowjj: cmn r0, r1 movcs r0, #0 movcc r0, #1 mov pc, lr
RVCT 2.2 with --arm and -O2 Gives: _Z8full_addjj PROC PUSH {r2,r3,lr} ADD r2,r1,r2 CMP r2,r1 MOVCS r1,#0 MOVCC r1,#1 STR r1,[sp,#0] STR r2,[sp,#4] STR r2,[r0,#0] LDRB r1,[sp,#0] STRB r1,[r0,#4] POP {r3,r12,pc} ENDP _Z11no_overflowjj PROC PUSH {r0-r3,lr} MOV r2,r1 MOV r1,r0 ADD r0,sp,#8 BL _Z8full_addjj LDR r0,[sp,#8] LDR r1,[sp,#0xc] STM sp,{r0,r1} LDRB r0,[sp,#4] RSBS r0,r0,#1 MOVCC r0,#0 POP {r1-r3,r12,pc} ENDP Seems just a tad worse than what you got :) What GCC did you use? Thanks, Michael Marcin

Michael Marcin wrote:
Phil Endecott wrote:
Guillaume Melquiond wrote:
#include <algorithm>
std::pair<unsigned, bool> full_add(unsigned a, unsigned b) { return std::make_pair(a + b, a + b < a); }
bool no_overflow(unsigned a, unsigned b) { return !full_add(a, b).second; }
Here's what I get on ARM:
_Z8full_addjj: adds r2, r1, r2 movcc r1, #0 movcs r1, #1 str r2, [r0, #0] strb r1, [r0, #4] mov pc, lr
_Z11no_overflowjj: cmn r0, r1 movcs r0, #0 movcc r0, #1 mov pc, lr
RVCT 2.2 with --arm and -O2
Gives:
_Z8full_addjj PROC PUSH {r2,r3,lr} ADD r2,r1,r2 CMP r2,r1 MOVCS r1,#0 MOVCC r1,#1 STR r1,[sp,#0] STR r2,[sp,#4] STR r2,[r0,#0] LDRB r1,[sp,#0] STRB r1,[r0,#4] POP {r3,r12,pc} ENDP
_Z11no_overflowjj PROC PUSH {r0-r3,lr} MOV r2,r1 MOV r1,r0 ADD r0,sp,#8 BL _Z8full_addjj LDR r0,[sp,#8] LDR r1,[sp,#0xc] STM sp,{r0,r1} LDRB r0,[sp,#4] RSBS r0,r0,#1 MOVCC r0,#0 POP {r1-r3,r12,pc} ENDP
Seems just a tad worse than what you got :) What GCC did you use?
$ arm-linux-gnu-g++ --version arm-linux-gnu-g++ (GCC) 4.1.2 20061028 (prerelease) (Debian 4.1.1-19) It looks like your compiler is not inlining full_add inside no_overflow, and is doing a lot of spurious argument save/restore. There's also some strange store-as-word/load-as-byte going on for the bool. But the core is this
ADD r2,r1,r2 CMP r2,r1 MOVCS r1,#0 MOVCC r1,#1
which compares with
adds r2, r1, r2 movcc r1, #0 movcs r1, #1
So either it doesn't know that the add and compare can be merged into an adds (a peephole optimisation I would think) or it is confused by the inversion of the carry bit. Regards, Phil.

From: Phil Endecott Michael Marcin wrote:
Phil Endecott wrote: RVCT 2.2 with --arm and -O2 It looks like your compiler is not inlining full_add inside no_overflow, and is doing a lot of spurious argument save/restore.
This is what I've recently learnt from Mathias:
From: Mathias Gaunard Robert Kawulak wrote:
I compiled it with gcc 4.01, optimisations turned on. I had to add explicit 'inline' to several functions Aggressive inlining should be enabled with -O3.
Having glanced at the manual I suppose this should also work with RVCT. ;-) Best regards, Robert
participants (5)
-
Daryle Walker
-
Guillaume Melquiond
-
Michael Marcin
-
Phil Endecott
-
Robert Kawulak