
Looking at the fcontext_x86_64_ms_pe_masm.asm it would seem that several ABI requirements are violated and it is not clear to me why this would always work.
then please tell which of the calling convention I've violated? I believe the code is conform to the x86_64 ABI described in the MSDN. It would be helpful if you could tell me what exactly does not conform to the calling convention?
Yes, that's sadly a bit of a pain. Documentation is pretty much nonexistent. There are a few Word documents in the compiler source tree, but the usual course of action was to "ask Kevin". There are a few bits on his blog somewhere http://blogs.msdn.com/b/freik/
For instance, the stack is not properly aligned in the call to _exit& set_fcontext. the stack is allocated with VirtualAlloc() with a multiple of the pagesize (SYSTEM_INFO.dwPageSize). in make_fcontext() I reserve space for return address + 32byte parameter area so that the begining of the stack is on a 16byte border. The stack is required to be aligned on 16 byte boundaries at each nonleaf function.
So when link_fcontext is entered, the last four bits of RSP should be clear. When _exit is entered, the same needs to be true. But it isn't. I guess the real test would be to allocate a terminator function, which requires an aligned stack like: void terminator(){ __m128i x; volatile __m128i y; // x would be allocated at a 16 byte boundary in the stack frame __m128i _mm_store_si128(&x,y); // MOVDQU triggers an exception if not aligned properly } #pragma section(".CRT$XPC",long,read) __declspec(allocate(".CRT$XPC")) const void(*x)() = my_terminator;
At least _exit can call into user code, IIRC (preterminators& terminator registered via .crt$x* sections) and that code may require a properly aligned stack. why is it not properly aligned?
I don't see proper unwind descriptors. What happens if an async exception is triggered in the middle of set_fcontext? The x86 Windows version seems to have similar problems. IIRC, the dispatch logic checks for EH registration nodes to be on the stack for x86. fcontext functions store/restore registers no need to install an exception handler. The Windows x64 ABI requires unwind descriptors for non-leaf functions. Absent unwind descriptors, the unwind might be terminated taking down the
The stack must be aligned on 16 byte boundary http://msdn.microsoft.com/en-us/library/ew5tede7(v=vs.80).aspx link_fcontext simply calls without adjusting RSP. Hence RSP in _exit is RSP-8. Both RSP and RSP-8 cannot be 16-byte aligned. process or a leaf function is assumed (which means adding 8 to the virtual RSP and continuing the unwind) There are also strict rules for which instruction forms can occur in the function prolog and epilog. The OS unwinder disassembles instructions to undo their effects. There are some pretty insidious rules that are not really documented anywhere (like an indirect tail call requires a REX prefix).
the make_fcontext() registers an EH structure in the stack. There are also strict rules on where exception registration nodes reside for x86. Earlier Windows versions weren't too strict about it, but that has changed over time.
If you look inside the test you see that I test throwing and catching exceptions. My concerns are less about C++ exceptions (which I presume you might trap), but about SEH exceptions. Sometimes they even happen without user code being involved directly (for instance the winuser resource hacks -- it could be that these actually enter the dispatch phase in user mode, not sure)
On x86_64 Windows exception handler is not installed in each stack frame. Unwindhandlers are installed by the compiler - similiar to gcc on UNIX. That's correct, but the compiler emits unwind information. For assembler code, it's your responsibility (just as it is for gcc on unix). There are no unwind descriptors, which is why the OS assumes your functions to be a leaf functions. The ntdll dispatcher simply adjust RSP in the virtual unwind context by 8 and continues with the caller (by reading the 64-bit word at the previous *RSP)
What happens if an untrapped exception is trigger in a code path called from the context entrypoint?
it depends - if the function invoked by boost::context doesn't catch exceptions std::terminate() is called other wise the next exceptions handler is triggered. All kinds of SEH exceptions can be dispatched in user mode. Much like async signals, but Windows supports pretty much the same scheme as C++ EH uses to trap these anywhere on the callstack. See __try et al. This works just like C++ EH, just with a different personality routine (__C_speficic_handler)
It would seem that there is an unwind chain via make_fcontext (which doesn't appear to have any unwind description and would therefore assumed to be a leaf function). it installes a simple EH in the stack used by the boost::context instance
Is its state necessarily live when set_fcontext is called?
? Well what happens if there is a SEH exception in code called from the context' entrypoint? This usually means some badness, but some Windows code raises exceptions for compatibility reasons (these are handled by the OS after no stack/vectored barrier is found).
What guarantees are there that poking a few fields in the TIB is good enough to properly switch the context? At least FLS has more than one field in the TIB. There's also bookkeeping for whether registered callbacks have been called. I can only follow the things which the MSDN describes - Windows is closed
The exception dispatcher will walk the stack till it finds a barrier. Walking the stack means looking up the RIP in the unwind context against various tables. So if an exception is raised and the unwinder will look at the RIP, find unwind info in the EXE/DLL's section. This continues up till the point where it arrives at the entrypoint specified in the make_fcontext call. But there is no reason for the unwinder to stop the process there. Instead it will find the unwind info for your entrypoint function and apply the instructions accordingly. However, since you switch stacks in its actual caller, I don't think these unwind instructions are correct. If you really want to know, I can probably explain a bit more about this. But the model is essentially the same as GCABI uses. There are some differences (the Windows model doesn't have cleanups for instance, there are different options for dynamically installed function tables, Windows doesn't support a cleanup function for the exception object and its language-independent unwind data is simpler, there are various other hooks in Windows...), but the basic mechanism is the same. source. That's understood -- and I do tend to think that there are certain dangers with your approach, which should be kept in mind. I maintained the VC++ CRT for quite some time and I had access to the source code, the hidden docs and all the right people and tend to think that especially the exception handling part is an incredibly hard thing to get perfectly correct.
For fiber-local-storage the user is responsible to cleanup. I don't recall the specifics, but FLS allows you to register callbacks for when a fiber exit (see FlsAlloc). There's some book-keeping information somewhere near the end of the TIB for that.
Why is saving the register you save enough? Why wouldn't a future version of Windows just take down the process if it detects stack hacks. I believe, the unwinder checks the stack pointer in the unwind context against stack limits and for 8-byte align at each caller in the chain until a barrier is found.
maybe - I can still use Windows Fiber ABI as boost::context wraps it There's also User Mode scheduling, which supposedly is less broken in Win7 SP1. Also be wary, fibers are still not fully supported by a lot of things. The CRT should be mostly fiber-aware, but much code isn't -- but then that's a problem you'll have regardless of how you switch register context.
How are you certain that kernel32/kernelbase/ntdll don't assume that there's a guard page (where guard page has the Windows defined meaning as in PAGE_GUARD) at the end of the stack?
Windows 7/XP don't - because you create the stack you could manipulate it Well, I guess the question would be what happens when you write beyond the stack limits. A Win32 thread stack would be autogrown (if address space is available). I'm not sure how this works exactly. This may happen entirely in the kernel. It would seem that you have a PAGE_NOACCESS page below your own stack, which would look almost the same to the OS as its own stacks.
Even if that all works today, how can you be sure it will on Win8, 9 and beyond? I can't - MS decides how further Windows version will work That makes it a bit dangerous. Security has become more of a religion at Microsoft...
To be honest, I don't think a library outside the operating system is not the proper place to do this kind of thing at all (and even if you're part of the OS it's an incredibly hard thing to get right -- UMS had tons of problems in Win7 RTM)
then use boost::context together with Windows Fiber ABI
BTW, glancing over the docs, an implicit conversion from size_t to protected_stack seems weird, but that might just be a glitch in the doc build. could you tell where did you found it
At http://ok73.ok.funpic.de/boost/libs/context/doc/html/ it reads ... protected_stack( std::size_t stacksize) which strikes me as a bit odd. Thanks! -hg