[context review] Windows ABI requirements

Hi, I've had a very brief look at the x64 ASM Windows code as I think I raised concerns about this before. As always, I'm pretty busy and I probably won't be able to properly review the entire lib. Looking at the fcontext_x86_64_ms_pe_masm.asm it would seem that several ABI requirements are violated and it is not clear to me why this would always work. For instance, the stack is not properly aligned in the call to _exit & set_fcontext. At least _exit can call into user code, IIRC (preterminators & terminator registered via .crt$x* sections) and that code may require a properly aligned stack. I don't see proper unwind descriptors. What happens if an async exception is triggered in the middle of set_fcontext? The x86 Windows version seems to have similar problems. IIRC, the dispatch logic checks for EH registration nodes to be on the stack for x86. What happens if an untrapped exception is trigger in a code path called from the context entrypoint? It would seem that there is an unwind chain via make_fcontext (which doesn't appear to have any unwind description and would therefore assumed to be a leaf function). Is its state necessarily live when set_fcontext is called? What guarantees are there that poking a few fields in the TIB is good enough to properly switch the context? At least FLS has more than one field in the TIB. There's also bookkeeping for whether registered callbacks have been called. Why is saving the register you save enough? Why wouldn't a future version of Windows just take down the process if it detects stack hacks. I believe, the unwinder checks the stack pointer in the unwind context against stack limits and for 8-byte align at each caller in the chain until a barrier is found. How are you certain that kernel32/kernelbase/ntdll don't assume that there's a guard page (where guard page has the Windows defined meaning as in PAGE_GUARD) at the end of the stack? Even if that all works today, how can you be sure it will on Win8, 9 and beyond? To be honest, I don't think a library outside the operating system is not the proper place to do this kind of thing at all (and even if you're part of the OS it's an incredibly hard thing to get right -- UMS had tons of problems in Win7 RTM) BTW, glancing over the docs, an implicit conversion from size_t to protected_stack seems weird, but that might just be a glitch in the doc build. Thanks! -hg

Am 21.03.2011 23:39, schrieb Holger Grund: > Looking at the fcontext_x86_64_ms_pe_masm.asm it would seem that several ABI > requirements are violated and it is not clear to me why this would always > work. then please tell which of the calling convention I've violated? I believe the code is conform to the x86_64 ABI described in the MSDN. It would be helpful if you could tell me what exactly does not conform to the calling convention? > For instance, the stack is not properly aligned in the call to _exit& > set_fcontext. the stack is allocated with VirtualAlloc() with a multiple of the pagesize (SYSTEM_INFO.dwPageSize). in make_fcontext() I reserve space for return address + 32byte parameter area so that the begining of the stack is on a 16byte border. > At least _exit can call into user code, IIRC (preterminators& > terminator registered via .crt$x* sections) and that code may require a > properly aligned stack. why is it not properly aligned? > I don't see proper unwind descriptors. What happens if an async exception is > triggered in the middle of set_fcontext? The x86 Windows version seems to > have similar problems. IIRC, the dispatch logic checks for EH registration > nodes to be on the stack for x86. fcontext functions store/restore registers no need to install an exception handler. the make_fcontext() registers an EH structure in the stack. If you look inside the test you see that I test throwing and catching exceptions. On x86_64 Windows exception handler is not installed in each stack frame. Unwindhandlers are installed by the compiler - similiar to gcc on UNIX. > What happens if an untrapped exception is trigger in a code path called from > the context entrypoint? it depends - if the function invoked by boost::context doesn't catch exceptions std::terminate() is called other wise the next exceptions handler is triggered. > It would seem that there is an unwind chain via > make_fcontext (which doesn't appear to have any unwind description and would > therefore assumed to be a leaf function). it installes a simple EH in the stack used by the boost::context instance > Is its state necessarily live when set_fcontext is called? ? > What guarantees are there that poking a few fields in the TIB is good enough > to properly switch the context? At least FLS has more than one field in the > TIB. There's also bookkeeping for whether registered callbacks have been > called. I can only follow the things which the MSDN describes - Windows is closed source. For fiber-local-storage the user is responsible to cleanup. > Why is saving the register you save enough? Why wouldn't a future version of > Windows just take down the process if it detects stack hacks. I believe, the > unwinder checks the stack pointer in the unwind context against stack limits > and for 8-byte align at each caller in the chain until a barrier is found. > maybe - I can still use Windows Fiber ABI as boost::context wraps it > How are you certain that kernel32/kernelbase/ntdll don't assume that there's > a guard page (where guard page has the Windows defined meaning as in > PAGE_GUARD) at the end of the stack? Windows 7/XP don't - because you create the stack you could manipulate it > Even if that all works today, how can you be sure it will on Win8, 9 and > beyond? I can't - MS decides how further Windows version will work > To be honest, I don't think a library outside the operating system is not > the proper place to do this kind of thing at all (and even if you're part of > the OS it's an incredibly hard thing to get right -- UMS had tons of > problems in Win7 RTM) > then use boost::context together with Windows Fiber ABI > BTW, glancing over the docs, an implicit conversion from size_t to > protected_stack seems weird, but that might just be a glitch in the doc > build. could you tell where did you found it

Looking at the fcontext_x86_64_ms_pe_masm.asm it would seem that several ABI requirements are violated and it is not clear to me why this would always work.
then please tell which of the calling convention I've violated? I believe the code is conform to the x86_64 ABI described in the MSDN. It would be helpful if you could tell me what exactly does not conform to the calling convention?
Yes, that's sadly a bit of a pain. Documentation is pretty much nonexistent. There are a few Word documents in the compiler source tree, but the usual course of action was to "ask Kevin". There are a few bits on his blog somewhere http://blogs.msdn.com/b/freik/
For instance, the stack is not properly aligned in the call to _exit& set_fcontext. the stack is allocated with VirtualAlloc() with a multiple of the pagesize (SYSTEM_INFO.dwPageSize). in make_fcontext() I reserve space for return address + 32byte parameter area so that the begining of the stack is on a 16byte border. The stack is required to be aligned on 16 byte boundaries at each nonleaf function.
So when link_fcontext is entered, the last four bits of RSP should be clear. When _exit is entered, the same needs to be true. But it isn't. I guess the real test would be to allocate a terminator function, which requires an aligned stack like: void terminator(){ __m128i x; volatile __m128i y; // x would be allocated at a 16 byte boundary in the stack frame __m128i _mm_store_si128(&x,y); // MOVDQU triggers an exception if not aligned properly } #pragma section(".CRT$XPC",long,read) __declspec(allocate(".CRT$XPC")) const void(*x)() = my_terminator;
At least _exit can call into user code, IIRC (preterminators& terminator registered via .crt$x* sections) and that code may require a properly aligned stack. why is it not properly aligned?
I don't see proper unwind descriptors. What happens if an async exception is triggered in the middle of set_fcontext? The x86 Windows version seems to have similar problems. IIRC, the dispatch logic checks for EH registration nodes to be on the stack for x86. fcontext functions store/restore registers no need to install an exception handler. The Windows x64 ABI requires unwind descriptors for non-leaf functions. Absent unwind descriptors, the unwind might be terminated taking down the
The stack must be aligned on 16 byte boundary http://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.80).aspx link_fcontext simply calls without adjusting RSP. Hence RSP in _exit is RSP-8. Both RSP and RSP-8 cannot be 16-byte aligned. process or a leaf function is assumed (which means adding 8 to the virtual RSP and continuing the unwind) There are also strict rules for which instruction forms can occur in the function prolog and epilog. The OS unwinder disassembles instructions to undo their effects. There are some pretty insidious rules that are not really documented anywhere (like an indirect tail call requires a REX prefix).
the make_fcontext() registers an EH structure in the stack. There are also strict rules on where exception registration nodes reside for x86. Earlier Windows versions weren't too strict about it, but that has changed over time.
If you look inside the test you see that I test throwing and catching exceptions. My concerns are less about C++ exceptions (which I presume you might trap), but about SEH exceptions. Sometimes they even happen without user code being involved directly (for instance the winuser resource hacks -- it could be that these actually enter the dispatch phase in user mode, not sure)
On x86_64 Windows exception handler is not installed in each stack frame. Unwindhandlers are installed by the compiler - similiar to gcc on UNIX. That's correct, but the compiler emits unwind information. For assembler code, it's your responsibility (just as it is for gcc on unix). There are no unwind descriptors, which is why the OS assumes your functions to be a leaf functions. The ntdll dispatcher simply adjust RSP in the virtual unwind context by 8 and continues with the caller (by reading the 64-bit word at the previous *RSP)
What happens if an untrapped exception is trigger in a code path called from the context entrypoint?
it depends - if the function invoked by boost::context doesn't catch exceptions std::terminate() is called other wise the next exceptions handler is triggered. All kinds of SEH exceptions can be dispatched in user mode. Much like async signals, but Windows supports pretty much the same scheme as C++ EH uses to trap these anywhere on the callstack. See __try et al. This works just like C++ EH, just with a different personality routine (__C_speficic_handler)
It would seem that there is an unwind chain via make_fcontext (which doesn't appear to have any unwind description and would therefore assumed to be a leaf function). it installes a simple EH in the stack used by the boost::context instance
Is its state necessarily live when set_fcontext is called?
? Well what happens if there is a SEH exception in code called from the context' entrypoint? This usually means some badness, but some Windows code raises exceptions for compatibility reasons (these are handled by the OS after no stack/vectored barrier is found).
What guarantees are there that poking a few fields in the TIB is good enough to properly switch the context? At least FLS has more than one field in the TIB. There's also bookkeeping for whether registered callbacks have been called. I can only follow the things which the MSDN describes - Windows is closed
The exception dispatcher will walk the stack till it finds a barrier. Walking the stack means looking up the RIP in the unwind context against various tables. So if an exception is raised and the unwinder will look at the RIP, find unwind info in the EXE/DLL's section. This continues up till the point where it arrives at the entrypoint specified in the make_fcontext call. But there is no reason for the unwinder to stop the process there. Instead it will find the unwind info for your entrypoint function and apply the instructions accordingly. However, since you switch stacks in its actual caller, I don't think these unwind instructions are correct. If you really want to know, I can probably explain a bit more about this. But the model is essentially the same as GCABI uses. There are some differences (the Windows model doesn't have cleanups for instance, there are different options for dynamically installed function tables, Windows doesn't support a cleanup function for the exception object and its language-independent unwind data is simpler, there are various other hooks in Windows...), but the basic mechanism is the same. source. That's understood -- and I do tend to think that there are certain dangers with your approach, which should be kept in mind. I maintained the VC++ CRT for quite some time and I had access to the source code, the hidden docs and all the right people and tend to think that especially the exception handling part is an incredibly hard thing to get perfectly correct.
For fiber-local-storage the user is responsible to cleanup. I don't recall the specifics, but FLS allows you to register callbacks for when a fiber exit (see FlsAlloc). There's some book-keeping information somewhere near the end of the TIB for that.
Why is saving the register you save enough? Why wouldn't a future version of Windows just take down the process if it detects stack hacks. I believe, the unwinder checks the stack pointer in the unwind context against stack limits and for 8-byte align at each caller in the chain until a barrier is found.
maybe - I can still use Windows Fiber ABI as boost::context wraps it There's also User Mode scheduling, which supposedly is less broken in Win7 SP1. Also be wary, fibers are still not fully supported by a lot of things. The CRT should be mostly fiber-aware, but much code isn't -- but then that's a problem you'll have regardless of how you switch register context.
How are you certain that kernel32/kernelbase/ntdll don't assume that there's a guard page (where guard page has the Windows defined meaning as in PAGE_GUARD) at the end of the stack?
Windows 7/XP don't - because you create the stack you could manipulate it Well, I guess the question would be what happens when you write beyond the stack limits. A Win32 thread stack would be autogrown (if address space is available). I'm not sure how this works exactly. This may happen entirely in the kernel. It would seem that you have a PAGE_NOACCESS page below your own stack, which would look almost the same to the OS as its own stacks.
Even if that all works today, how can you be sure it will on Win8, 9 and beyond? I can't - MS decides how further Windows version will work That makes it a bit dangerous. Security has become more of a religion at Microsoft...
To be honest, I don't think a library outside the operating system is not the proper place to do this kind of thing at all (and even if you're part of the OS it's an incredibly hard thing to get right -- UMS had tons of problems in Win7 RTM)
then use boost::context together with Windows Fiber ABI
BTW, glancing over the docs, an implicit conversion from size_t to protected_stack seems weird, but that might just be a glitch in the doc build. could you tell where did you found it
At http://ok73.ok.funpic.de/boost/libs/context/doc/html/ it reads ... protected_stack( std::size_t stacksize) which strikes me as a bit odd. Thanks! -hg

Yes, that's sadly a bit of a pain. Documentation is pretty much nonexistent. There are a few Word documents in the compiler source tree, but the usual course of action was to "ask Kevin".
There are a few bits on his blog somewhere http://blogs.msdn.com/b/freik/
I used http://msdn.microsoft.com/de-de/library/9b372w95.aspx 'Calling Convention'. At least which registers must be preserved between function calls and which are used for passing arguemnts to fucntions should be done correct - otherwise I would expect that the tests and the examples should crash.
At least _exit can call into user code, IIRC (preterminators& terminator registered via .crt$x* sections) and that code may require a properly aligned stack. why is it not properly aligned?
The stack must be aligned on 16 byte boundary http://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.80).aspx
link_fcontext simply calls without adjusting RSP. Hence RSP in _exit is RSP-8. Both RSP and RSP-8 cannot be 16-byte aligned.
ok - I'll rework the code
Is its state necessarily live when set_fcontext is called?
? Well what happens if there is a SEH exception in code called from the context' entrypoint? This usually means some badness, but some Windows code raises exceptions for compatibility reasons (these are handled by the OS after no stack/vectored barrier is found).
The exception dispatcher will walk the stack till it finds a barrier. Walking the stack means looking up the RIP in the unwind context against various tables.
So if an exception is raised and the unwinder will look at the RIP, find unwind info in the EXE/DLL's section. This continues up till the point where it arrives at the entrypoint specified in the make_fcontext call. But there is no reason for the unwinder to stop the process there. Instead it will find the unwind info for your entrypoint function and apply the instructions accordingly. However, since you switch stacks in its actual caller, I don't think these unwind instructions are correct.
I'll take a look into MSDN to check your statements. As you already know it is a pain to get the relevant informations regarding to SEH from the MSDN. At least catching the C++ exceptions works.
What guarantees are there that poking a few fields in the TIB is good enough to properly switch the context? At least FLS has more than one field in the TIB. There's also bookkeeping for whether registered callbacks have been called. I can only follow the things which the MSDN describes - Windows is closed source. That's understood -- and I do tend to think that there are certain dangers with your approach, which should be kept in mind. I maintained the VC++ CRT for quite some time and I had access to the source code, the hidden docs and all the right people and tend to think that especially the exception handling part is an incredibly hard thing to get perfectly correct.
probably you are not allowed to tell me the secrets :^)? The only chance to get it rigth on Windows is to rely on the MSDN, test (try-and-error).
For fiber-local-storage the user is responsible to cleanup. I don't recall the specifics, but FLS allows you to register callbacks for when a fiber exit (see FlsAlloc). There's some book-keeping information somewhere near the end of the TIB for that.
TIB isn't well described in the MSDN (at least the end of the TIB). AFAIK FLS can only be used if the thread was fibierized (ConvertThreadToFiber()). In the case of fcontext on x86 Windows this isn't done- so boost.context could require not to use FLS.
How are you certain that kernel32/kernelbase/ntdll don't assume that there's a guard page (where guard page has the Windows defined meaning as in PAGE_GUARD) at the end of the stack?
Windows 7/XP don't - because you create the stack you could manipulate it Well, I guess the question would be what happens when you write beyond the stack limits. A Win32 thread stack would be autogrown (if address space is available). I'm not sure how this works exactly. This may happen entirely in the kernel. It would seem that you have a PAGE_NOACCESS page below your own stack, which would look almost the same to the OS as its own stacks.
boost.context doesn't supports automatically growing stacks - I believe it isn't necessary. If I look at my X-server it uses a stack of 276 kB. the class protected_stack appends a guard page at the end of the memory -generating a segfault/access violation if addresses of this page where accessed.
Even if that all works today, how can you be sure it will on Win8, 9 and beyond? I can't - MS decides how further Windows version will work That makes it a bit dangerous. Security has become more of a religion at Microsoft...
I don't know what MS will do in its next versions of Windows. If MS decides to break some calling conventions then I've to fix the code. what could I do instead?
BTW, glancing over the docs, an implicit conversion from size_t to protected_stack seems weird, but that might just be a glitch in the doc build. could you tell where did you found it
At http://ok73.ok.funpic.de/boost/libs/context/doc/html/ it reads
... protected_stack( std::size_t stacksize)
contructor of class protected_stack takes size_t as argument - do you miss the 'explicit' keyword? thanks for the hints, Oliver -- GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

That's understood -- and I do tend to think that there are certain dangers with your approach, which should be kept in mind. I maintained the VC++ CRT for quite some time and I had access to the source code, the hidden docs and all the right people and tend to think that especially the exception handling part is an incredibly hard thing to get perfectly correct.
probably you are not allowed to tell me the secrets :^)? The only chance to get it rigth on Windows is to rely on the MSDN, test (try-and-error). These things are not secret in the sense that Microsoft doesn't want anyone to know.
It's more that some implementation details shouldn't be relied on and probably more importantly, people just don't have the time to polish things up for external consumption. Exception handling is particularly nasty in that it's extremely messy code and involves compiler FE, BE, operating system and the runtime library. There are very few people inside MS to fully understand what's going on under the covers. I'm happy to help out where I can, but just glancing over the boost mailing lists is very time consuming.
For fiber-local-storage the user is responsible to cleanup. I don't recall the specifics, but FLS allows you to register callbacks for when a fiber exit (see FlsAlloc). There's some book-keeping information somewhere near the end of the TIB for that.
TIB isn't well described in the MSDN (at least the end of the TIB). AFAIK FLS can only be used if the thread was fibierized (ConvertThreadToFiber()). In the case of fcontext on x86 Windows this isn't done- so boost.context could require not to use FLS.
FLS can be used without ConvertThreadToFiber. In fact, the CRT uses FLS on x86 Vista+ and x64 (and many people don't use fibers).
That makes it a bit dangerous. Security has become more of a religion at Microsoft...
I don't know what MS will do in its next versions of Windows. If MS decides to break some calling conventions then I've to fix the code. what could I do instead? I guess, it's a fairly safe bet that the calling convention isn't changed in incompatible ways.
There are strong compatibility guarantees in Windows. There are a lot of people working no nothing but it (and they will let you know if you break something :-) ). But there's a difference on relying on documented behavior and information obtained by trial and error of a particular version of the OS. My take would be that you're doing this at your own risk. In theory, it's possible to run a process without a kernel32.dll, but all bets are off then. So it is possible for Windows to make a change to the OS that requires coordination between user and kernel parts. My point here is, that you are at a significant disadvantage to the OS here as you would have to track a moving target. Again, UMS might be worthwhile to look at in Win7 SP1+.
http://ok73.ok.funpic.de/boost/libs/context/doc/html/ it reads
... protected_stack( std::size_t stacksize)
contructor of class protected_stack takes size_t as argument - do you miss the 'explicit' keyword?
Yes, I almost write it by default and everywhere :-) Thought, that this was a glitch of the doc build system. But it seems the ctor isn't declared explicit. Seems odd, but then I haven't looked to closely. -hg
participants (2)
-
Holger Grund
-
Oliver Kowalke