Re: [boost] [context/coroutine] split into two libs in trunk?!

14 Apr 2012

      Am 14.04.2012 00:37, schrieb Giovanni Piero Deretta:
...
Not saving the SSE and x87 control word was a conscious decision on my
part. The control words are unlike other callee/caller saved registers
as they define a process mode and are explicitly under the control of
the user. In my tests the instructions used to load/save these states
had a considerable cost on my old netburst CPU.
The compiler may temporarily change the control state (for example in
legacy x87 mode to implement some non-standard rounding), but it has
to reset them to the original value before calling any externally
defined function (like the ASM context switching functions) as these
will expect the control words to be in the default state (whatever
this is).
The only time called functions will see the control words in a non
default state is if the user explicitly changed the state, for example
via a C99 compiler pragma or builtin function. The boost.coroutine
documentation did explicitly warn about risky changes to proces state
across coroutine calls, including the signal mask (which boost.context
also does not preserve), locks, TLS and of course the FPU state.
but what about code you don't have under your control (legacy libs etc.)?
Having said that, I doubt that on a modern CPU this extra state
save/change would hardly cost more than an extra 50% on a context call
(which in the grand order of things isn't really that much). Any
claimed scalability differences between boost.context and the my old
library must come from somewhere else and not from the low lever
context switching routines. The only thing that comes to my mind is
that boost.coroutine did save all registers on the stack (which is
very likely to be cache hot) instead of a separate structure as for
boost.context (which, IIRC, was heap allocated in the higher lever
wrapper).
I you refer to my performance tests - I never compared boost.context 
with boost.coroutine -
I've measured the cycle-costs of fcontext and ucontext.
...
FWIW, while it is hard to compare my results on an old 32 bit machine
with yours on an undoubtely newer CPU and OS, I distinctly remember
from my tests that a coroutine-to-coroutine switch (using the high
level API) was about an 100 time faster using the custom backend than
using ucontext (mainly because of the high cost of the function call).
HTH,
that was the same what I figured out (see above fcontext vs. ucontext 
and performance test app in boost.context).
I assumed that your lib used ucontext as back-end and therefore I've had 
concerns about that it would
be much faster than boost.context (as told in another post).
btw, file swapcontext64.cpp (from your lib) might contain a bug

https://svn.boost.org/svn/boost/sandbox/SOC/2006/coroutine/trunk/libs/corout...

it preserves the registers rbx, rbp, rax , rdx.
I think it should be rbx, rbp, r12-r15 (+SSE2 and x87) as described in' 
SysV ABI AMD64 Architecture Processor Supplement - Draft Version 0.99.4'.

Re: [boost] [context/coroutine] split into two libs in trunk?!

Oliver Kowalke