
Paul Mensonides wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
Apologies for the OT, but why "verbose"? __stdcall isn't verbose at all.
Sorry, underhanded (yet well-deserved) shot at Pascal.
^ ;-) ^
Btw. "__pascal" and "__stdcall" are not exactly the same: Both require the callee to clean up but the arguments are ordered differently on the stack.
Hmm, didn't know there was a difference. In any case, the callee cleanup is the part that makes it inferior to __cdecl. It isn't as general, both for variadics
Variadics would require some sort of "dynamic cleanup" if done at the callee-site (I doubt there is any compiler out there implementing ugly and crazy stuff like this)...
and possible tail-recursion optimizations--particularly in mutually tail-recursive functions:
int g(int x);
int f(int x) { return g(x); }
int g(int x) { return f(x); }
(Yes, I know there is no termination.) The point is that the compiler could push 'x' onto the stack from external code, but 'f' and 'g' could call each other repeatedly without touching the stack at all, and then when the call actually does return to the external code, the external code can pop 'x' from the stack.
This kind of "stack frame recycling" is attractive, indeed. You may still be able to request it more explicitly (without relying on optimization at all) by using an equivalent, iterative algorithm ;-).
This can work even if 'f' and 'g' are separately compiled (i.e. not optimized as a unit).
Say we split it into two translation units. The compiler sees this code int g(int x); int f(int x) { return g(x); } and cannot know what g does, so the full code generation for the call must happen in the linker. How does the linker then know the callee doesn't change 'x' and it's legal to reuse the stack frame here? Does the other object file contain this kind of information?
This doesn't work under 'callee cleanup' without extra scaffolding because the callee cannot know if 'x' should be popped.
Well, the callee always cleans up. So we'ld have to "unpop" the values at the call-site to reuse the stack frame and add unecessary code. This code, however, can be theoretically eliminated in the CPU at runtime: add esp,4 sub esp,2 can be, given there are no instructions in between that use the stack pointer register (these side-effects are tracked anyway for pipelining), transformed to add esp,2 Further there are numerous situation where stack reuse isn't applicable and caller cleanup involves more code at the call-site, so I believe __stdcall has its place. The only quite useless calling convention I see, talking x86, is __fastcall, which attempts to use CPU registers for the argument values. Basically it might have been a good idea but this CPU has too few and not even all-purpose registers for __fastcall to make much sense (except for very tiny functions, perhaps). Thanks, Tobias