On 17 Dec 2016 at 20:32, Peter Dimov wrote:
In an exception handler you cannot call any async unsafe routine such as anything in MSVCRT nor anything implemented by kernel32.dll in userspace. As on POSIX, almost all syscalls implemented entirely in kernel space are safe.
Thanks Niall. Do you know which Windows API functions are safe and which aren't? I couldn't find a list anywhere.
I don't think there is even a list internal to Microsoft. There are various user compiled lists around the internet, the ReactOS team also have a good list somewhere on their mailing list. I've seen my code which worked perfectly on Win7 rarely and randomly deadlock on Win10 and one time, vice versa. WOW64 also has a very different safe list to native Win32. There are some obvious things not to call: anything which obviously runs code in userspace.
Antony makes the valid point that on Windows there are race problems with the DbgHelp library, in fact not only is it not async-unsafe, it's also thread-unsafe.
He doesn't use DbgHelp in the Windows backend though, he uses Dbgeng.h. This is not the same thing, I think?
Woohoo! That is amazing news, and congrats to Antony for getting Dbgeng working. Last month when I prereviewed Stacktrace I mentioned that DbgHelp was a steaming pile of poo and that I had had much more reliable experience with the thoroughly superior Dbgeng. Antony asked for some example code because Dbgeng is barely documented, I no longer had access to the code I wrote many years ago which used it. I tried to cobble something together and I made some progress over what Antony had, but I ran out of time due to needing to mind the recent new baby. I'm not actually sure what he changed from what I sent him, mine and his look very close, I must have missed something very small. I hadn't realised Antony figured it out and had assumed he was still on DbgHelp, and the fact he's using Dbgeng makes Stacktrace much superior to 99% of the windows stack trace implementations out there. Particular benefits include: * Dbgeng understands non-native stack frames, so mixed .NET, WinRT and C++/CLI stack traces just work. * Dbgeng is threadsafe. * Dbgeng doesn't randomly fail and randomly work next time you call it.
Of course Windows has signals, as already referred to by myself earlier it's called vectored exception handling which is exactly the same as a signal implementation.
Not quite. A signal immediately suspends the thread and calls the handler in it. Windows exception handling, in contrast, unwinds the stack. So if the kernel crashes somewhere deep, it can unwind itself to a usable state before the program gets to handle the exception. Or at least that's my understanding.
You're missing a few steps. 1. RaiseException() is like kill() and starts the signal handling process with parameters. Hardware exceptions raise an exception at the immediately point of ocurrance i.e. inside any locks etc. 2. Any installed vectored exception handlers are like sigaction() except they are called for all exception codes. The handler returns whether it handled it or whether to keep searching. Vectored exception handlers are process wide. 3. If still unhandled, the Thread Information Block (TIB) for the thread where the exception occurred is asked for the current thread-local TEH (Table Exception Handling) on x64 or SEH (Structured Exception Handling) on x86. A search for all handlers installed for all code in the stack until the point of exception are called in reverse order. Each handler may handle the exception, or say to keep searching. 4. Every thread is always begun with a default exception handler which opens that famous dialog box and terminates the process, so if your exception reaches this the right thing happens. 5. C++ exceptions are implemented 100% a client of the same TEH and SEH framework. In fact they are simply a RaiseException(0xE06D7363, ...). If you dive into the implementation of __CxxThrowException() you'll see that. What's very important to note is that all this occurs without unwinding the stack and at the point of the exception with any locks still held. This is the source of the reentrancy which causes the deadlocks if you call any userspace implemented code from an exception handler. The reason it doesn't unwind is because a handler may choose to restart execution of the failed operation. There was a very clever commercial C++ object-to-disk system a very long time ago which serialised C++ objects out to disk and removed the RAM storage. When the C++ program faulted on accessing them, the SEH handler would deserialise the object back into RAM and restart the failed instruction. Worked beautifully. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/