
On Nov 13, 2007 3:19 AM, Jeff Garland <jeff@crystalclearsoftware.com> wrote:
Robert Kawulak wrote:
From: Jeff Garland
I'm not sure I understand - why you think there's a function call and 2 comparisons in that example?
The function's code simply returns the literal value of 5 and that's it.
Anyway, I've tried with more representative example - the value is not known at compile time:
extern int i;
void test() { bounded_int<int, 3, 5>::type b(i); }
With #define NDEBUG and -O2 switch, gcc 4.0.1 gave the following code:
__Z4testv: [...] irrelevant stack-manipulation code
That 'irrelevant stack manipulation code' isn't irrelevant at all...it's the function call overhead cost and would likely *not exist* in a fully static version. Same with the stuff right before the return.
Actually the stack manipulation is for the test function, not for the bound checking function, so, as Robert said, it is completely irrelevant in this context. In fact the __Z4testv: is just that, the mangled name of the test function.
<snip>...good details...
In both cases the compilers were able to optimise-away the abstraction of bounds generators, using the literal values of bounds in the comparisons. I'm not sure this is what you were worrying about, but if so, then I think there's no problem at all. ;-)
Nope...it's the stack code which can't be optimized out because you have to have a function pointer and call a different location instead of fully inlining. If you take out the dynamic function part I bet you'll cut the generated code down to just the compares and the error handling...
I was surprised for the lack of call to bound functions, because in my experience gcc is completely incapable to inline function pointers (except when used as non type template parameters), so I went and looked at the code.
From a cursory look I can se no use of function pointers, nor I would expect it: the bounds are knonw at compile time so there is no need of dynamic dispatch. The framework seems to allow dynamic (runtime) bounds, but even in that case, i see no need for dynamic dispatch.
BTW, when compiling with gcc 4.1.2 under x86_64: .globl my_test_function() .type my_test_function(), @function my_test_function(): .LFB1638: [ stack frame setup for test function ] movq %rbx, -32(%rsp) .LCFI6: movq %rbp, -24(%rsp) .LCFI7: movq %r12, -16(%rsp) .LCFI8: movq %r13, -8(%rsp) .LCFI9: subq $72, %rsp .LCFI10: [ load i and bound check it ] movl i(%rip), %eax cmpl $2, %eax jle .L44 cmpl $5, %eax jle .L71 .L44: [ error detection ... snipped ] .L71: [ stack frame tear down ] movq 40(%rsp), %rbx movq 48(%rsp), %rbp movq 56(%rsp), %r12 movq 64(%rsp), %r13 addq $72, %rsp ret This means that the bound checking code is just 4 instructions. I think you can hardly do better than that. This is even better than the disassembly reported by Robert because it skips bound loads in %eax and directly encodes immediates as an operand of the cmp instruction. This saves two instruction, but I'm not sure if in practice it makes a difference. BTW, to be 100% honest, the stack frames setup up and tear down are not completely unrelated to the usage of the bounded integer. Even if the fast path only partically uses one register (rax, which is callee clobbered and doesn't need saving), four more registers are saved even if they are only used in the slow error checking path. If the compiler could figure that the error checking path is not entered often, it could move the save and restore in that path. Anyways, in a real life function you would need to use many registers anyway, so this doesn't really matter. And yes, I'm interested in a bounded integers library. HTH, -- gpd