
A recent problem I encountered that caused the total failure of a regression test run has highlighted what I (and others) believe to be a design flaw in execution_monitor::catch_signals (in file execution_monitor.ipp). IIUC the purpose of the code in question is to intercept UNIX signals and convert them to exceptions which contain a diagnostic message indicating the signal type. For most signals this should work reliably. However, we suggest that this mechanism is prone to failure when attempting to deal with a SIGSEGV, as happened in my case. What I observed was a test process that had stalled by blocking against a mutex. Detailed analysis showed that the test failed with a memory segment violation resulting in a SIGSEGV. In brief, the primary execution failure corrupts the program heap. When the C++ exception is thrown at execution_monitor:462, the exception handling mechanism calls __cxa_allocate_exception which then calls std::malloc. But, because of the corrupted heap, this call blocks against the malloc mutex. This is the specific case with QNX, and other OSs will deal with this in different ways. The general point I would like to make is that after a memory segment violation, any process's memory will be in an undefined state and it is unreasonable to assume that it will be capable of continued execution as per the present design. I propose that there is no safe way to intercept a SIGSEGV and that it should be allowed to terminate the process without intervention. Comments please... Jim