Re: [boost] [context review] Windows ABI requirements

22 Mar 2011

      ...
...
Looking at the fcontext_x86_64_ms_pe_masm.asm it would seem that 
several ABI requirements are violated and it is not clear to me why 
this would always work.
then please tell which of the calling convention I've violated?
I believe the code is conform to the x86_64 ABI described in the MSDN.
It would be helpful if you could tell me what exactly does not conform to
the calling convention?
Yes, that's sadly a bit of a pain. Documentation is pretty much nonexistent.
There are a few Word documents in the compiler source tree, but the usual
course of action was to "ask Kevin".

There are a few bits on his blog somewhere
http://blogs.msdn.com/b/freik/
...
...
For instance, the stack is not properly aligned in the call to _exit& 
set_fcontext.
the stack is allocated with VirtualAlloc() with a multiple of the pagesize
(SYSTEM_INFO.dwPageSize).
in make_fcontext() I reserve space for return address + 32byte parameter
area so that the begining of the stack is on a 16byte border.
The stack is required to be aligned on 16 byte boundaries at each nonleaf
function.
So when link_fcontext is entered, the last four bits of RSP should be clear.
When _exit is entered, the same needs to be true. But it isn't.

I guess the real test would be to allocate a terminator function, which
requires an aligned stack like:

void terminator(){
   __m128i x; volatile __m128i y; // x would be allocated at a 16 byte
boundary in the stack frame
  __m128i _mm_store_si128(&x,y); // MOVDQU triggers an exception if not
aligned properly
}
#pragma section(".CRT$XPC",long,read)
__declspec(allocate(".CRT$XPC")) const  void(*x)() = my_terminator;
...
...
At least _exit can call into user code, IIRC (preterminators& 
terminator registered via .crt$x* sections) and that code may require 
a properly aligned stack.
why is it not properly aligned?
...
...
I don't see proper unwind descriptors. What happens if an async 
exception is triggered in the middle of set_fcontext? The x86 Windows 
version seems to have similar problems. IIRC, the dispatch logic 
checks for EH registration nodes to be on the stack for x86.
fcontext functions store/restore registers no need to install an exception
handler.
The Windows x64 ABI requires unwind descriptors for non-leaf functions.
Absent unwind descriptors, the unwind might be terminated taking down the
The stack must be aligned on 16 byte boundary
http://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.80).aspx

link_fcontext simply calls without adjusting RSP. Hence RSP in _exit is
RSP-8. Both RSP and RSP-8 cannot be 16-byte aligned.

process or a leaf function is assumed (which means adding 8 to the virtual
RSP and continuing the unwind)
There are also strict rules for which instruction forms can occur in the
function prolog and epilog. The OS unwinder disassembles instructions to
undo their effects. There are some pretty insidious rules that are not
really documented anywhere (like an indirect tail call requires a REX
prefix).
...
the make_fcontext() registers an EH structure in the stack.
There are also strict rules on where exception registration nodes reside for
x86. Earlier Windows versions weren't too strict about it, but that has
changed over time.
...
If you look inside the test you see that I test throwing and catching
exceptions.
My concerns are less about C++ exceptions (which I presume you might trap),
but about SEH exceptions. Sometimes they even happen without user code being
involved directly (for instance the winuser resource hacks -- it could be
that these actually enter the dispatch phase in user mode, not sure)
...
On x86_64 Windows exception handler is not installed in each stack frame.
Unwindhandlers are installed by the compiler
- similiar to gcc on UNIX.
That's correct, but the compiler emits unwind information. For assembler
code, it's your responsibility (just as it is for gcc on unix). There are no
unwind descriptors, which is why the OS assumes your functions to be a leaf
functions. The ntdll dispatcher simply adjust RSP in the virtual unwind
context by 8 and continues with the caller (by reading the 64-bit word at
the previous *RSP)
...
...
What happens if an untrapped exception is trigger in a code path 
called from the context entrypoint?
it depends - if the function invoked by boost::context doesn't catch
exceptions std::terminate() is called other wise the next exceptions handler
is triggered.
All kinds of SEH exceptions can be dispatched in user mode. Much like async
signals, but Windows supports pretty much the same scheme as C++ EH uses to
trap these anywhere on the callstack. See __try et al. This works just like
C++ EH, just with a different personality routine (__C_speficic_handler)
...
...
It would seem that there is an unwind chain via make_fcontext (which 
doesn't appear to have any unwind description and would therefore 
assumed to be a leaf function).
it installes a simple EH in the stack used by the boost::context instance
...
...
Is its state necessarily live when set_fcontext is called?
?
Well what happens if there is a SEH exception in code called from the
context' entrypoint? This usually means some badness, but some Windows code
raises exceptions for compatibility reasons (these are handled by the OS
after no stack/vectored barrier is found).
...
...
What guarantees are there that poking a few fields in the TIB is good 
enough to properly switch the context? At least FLS has more than one 
field in the TIB. There's also bookkeeping for whether registered 
callbacks have been called.
I can only follow the things which the  MSDN describes - Windows is closed
The exception dispatcher will walk the stack till it finds a barrier.
Walking the stack means looking up the RIP in the unwind context against
various tables.

So if an exception is raised and the unwinder will look at the RIP, find
unwind info in the EXE/DLL's section. This continues up till the point where
it arrives at the entrypoint specified in the make_fcontext call. But there
is no reason for the unwinder to stop the process there. Instead it will
find the unwind info for your entrypoint function and apply the instructions
accordingly. However, since you switch stacks in its actual caller, I don't
think these unwind instructions are correct.

If you really want to know, I can probably explain a bit more about this.
But the model is essentially the same as GCABI uses. There are some
differences (the Windows model doesn't have cleanups for instance, there are
different options for dynamically installed function tables, Windows doesn't
support a cleanup function for the exception object and its
language-independent unwind data is simpler, there are various other hooks
in Windows...), but the basic mechanism is the same.

source.
That's understood -- and I do tend to think that there are certain dangers
with your approach, which should be kept in mind. I maintained the VC++ CRT
for quite some time and I had access to the source code, the hidden docs and
all the right people and tend to think that especially the exception
handling part is an incredibly hard thing to get perfectly correct.
...
For fiber-local-storage the user is responsible to cleanup.
I don't recall the specifics, but FLS allows you to register callbacks for
when a fiber exit (see FlsAlloc). There's some book-keeping information
somewhere near the end of the TIB for that.
...
...
Why is saving the register you save enough? Why wouldn't a future 
version of Windows just take down the process if it detects stack 
hacks. I believe, the unwinder checks the stack pointer in the unwind 
context against stack limits and for 8-byte align at each caller in the
chain until a barrier is found.
maybe - I can still use Windows Fiber ABI as boost::context wraps it
There's also User Mode scheduling, which supposedly is less broken in Win7
SP1. Also be wary, fibers are still not fully supported by a lot of things.
The CRT should be mostly fiber-aware, but much code isn't -- but then that's
a problem you'll have regardless of how you switch register context.
...
...
How are you certain that kernel32/kernelbase/ntdll don't assume that 
there's a guard page (where guard page has the Windows defined meaning 
as in
PAGE_GUARD) at the end of the stack?
Windows 7/XP don't - because you create the stack you could manipulate it
Well, I guess the question would be what happens when you write beyond the
stack limits. A Win32 thread stack would be autogrown (if address space is
available). I'm not sure how this works exactly. This may happen entirely in
the kernel. It would seem that you have a PAGE_NOACCESS page below your own
stack, which would look almost the same to the OS as its own stacks.
...
...
Even if that all works today, how can you be sure it will on Win8, 9 
and beyond?
I can't - MS decides how further Windows version will work
That makes it a bit dangerous. Security has become more of a religion at
Microsoft...
...
To be honest, I don't think a library outside the operating system is 
not the proper place to do this kind of thing at all (and even if 
you're part of the OS it's an incredibly hard thing to get right -- 
UMS had tons of problems in Win7 RTM)
then use boost::context together with Windows Fiber ABI
...
...
BTW, glancing over the docs, an implicit conversion from size_t to 
protected_stack seems weird, but that might just be a glitch in the 
doc build.
 could you tell where did you found it
At 
http://ok73.ok.funpic.de/boost/libs/context/doc/html/
it reads

...
protected_stack( std::size_t stacksize)

which strikes me as a bit odd.

Thanks!
-hg

Re: [boost] [context review] Windows ABI requirements

Holger Grund