On 3/15/2012 11:41 AM, Vicente J. Botet Escriba wrote:
Hi,
does this means that there are some restrictions on the code that can be executed on the boost::thread_interrupted catch handler or some guidelines that s/he must follow? Is it legal that the user throws itself a boost::thread_interrupted exception? Is there a way to clear the interrupted flag so that the join() is not interrupted on the boost::thread_interrupted catch handler?
What was wrong with the user code that made the library crash? Which precondition of the library was violated? Maybe some assertions help so that the error is identified as soon as possible.
Best, Vicente
Hello Vicente, I'd like to point out that the library didn't crash. My code did. This was caused by faulty logic on my part -- a race condition with a very small window of opportunity. I had a cascading startup and shutdown design, for example (please pardon my verbose answer): Startup: -------- main spawns T1 and waits for T1 to indicate it is done with init T1 spawns T2 and waits for T2 to indicate it is done with init T2 spawns T3 and waits for T3 to indicate it is donewith init T3 spawns T4 and waits for T4 to indicate it is done with init T4 comes up finishes it's init, informs T3 it's init is done and enters its main loop. T3 finishes its init, signals T2 its init is done and enters its main loop. T2 finishes its init, signals T1 its init is done and enters its main loop. T1 spawns T2 and waits for T2 to indicate it is done with it's init and then enters its main loop. The system is now running, so main blocks until it receives a terminate signal (-2). Shutdown: --------- This is pretty much the reverse of the startup. signal -2 is received which unblocks main main tells T1 to shutdown and then join()s on the thread waiting for it to finish. T1 tells T2 to shutdown and then join()s on the thread waiting for it to finish. T2 tells T3 to shutdown and then join()s on the thread waiting for it to finish. T3 tells T4 to shutdown and then join()s on the thread waiting for it to finish. T4 does its shutdown logic and then exits the thread T3 wakes from join() finishes its shutdown logic and exits the thread T2 wakes from join() finishes its shutdown logic and exits the thread T1 wakes from join() finishes its shutdown logic and exits the thread main finishes its shutdown and the program exits. As part of that shutdown it invokes the destructor for its T1 object. Part of T1's destruction is to destruct the T2 object, which destructrs T3, which destructs T4. With this understanding of the overall architecture, we can focus on my faulty shutdown logic. main() told T1 to shutdown but it used two(2) methods to inform T1. The first was the thread_interrupt, the second was to set a shutdown flag in T1's object. However, when main uses T1.interupt() that also causes a flag to be set in the boost thread object too. T1 happens to be in it's T1.check_for_shutdown() routine, after it invoked boost::this_thread::interruption_point() check but before the "if shutdown flag" check. T1.check_for_shutdown() // this will throw if the boost interruption requested flag is set boost::this_thread::interruption_point(); **** the code is here when main() ran it T1.shutdown() *** if (T1.m_shutdown) { throw boost::thread_interrupted } So I detected the shutdown with MY logic m_shutdown, not with an interruption point. Consequently the boost thread's interruption is still pending, and will remain pending until another interruption point is hit. In my code, the thread_interrupted is caught by a handler that is waiting fot this and it then invokes T1.thread_shutdown() routine. Which for T1 is to send a boost interrupt to T2, and then join on T2. BUT, join() is an interuption point, my code would now act upon that pending interruption, thowing ANOTHER thread_interrupted, exiting join(), finishing off shutdown logic and then exiting the thread. So my code would terminate T1 before all of its child threads had terminated. Now, main legally unblocks from join(), since T1 exited, and then it invokes the destructor on T1, which does cascade destructions ont the T1-T4 objects. THIS is what leads to my segmentation fault, called pure virtual, etc. errors. Because there are threads still alive running code based on that object, access data from that object, which was just deleted out from underneath it. Can the user throw a boost::thread_interrupted exception? To be honest this wasn't the problem. Throwing this doesn't impact the setting of the threads "do I have an interrupt pending" flag. I could have thrown my own custom exception and I still would have encountered this problem. The problem is that my "check_for_shutdown" logic assumed that when it exited no interrupts would be pending. Is there a way to clear the interrupted flag so that the join is not interrupted? I would argue that clearing the flag isn't correct. One could add logic such as: try { boost::this_thread::interruption_point(): } catch (boost::thread_interrupted &) { } join() Which would clear the flag. However, while I am blocked in join(), some other thread could send me another interrupt which would break me out of join(). Not what I wanted. I feel that Anthony's sugestion of blocking interrupts for this method is the appropriate way to go. I don't think any assertions could help, but maybe a documentation improvement? Or maybe it's there and I didn't read carefully enough? After looking at the boost codebase and this learning exchange, I learned that when the interrupt() method is invoked the thread notes the interrupt and the interrupt is pending until an interrupt point is hit. Moreover, only one interrupt can be pending at a time. For example, two calls to interrupt() set the threads interrupt flag once. Its either on or off. So the next interruption point will clear that flag. It's like the old style signals. Regards, -=John