On 15/03/12 16:47, John Rocha wrote:
First I want to thank you for helping with this. How did you come to this answer so quickly? Is this a technique or tool that I can learn, or was this wisdom from having worked with your library for so long?
I tried it out, saw the problem manifest, trapped it in gdb, and examined the code. I guess it's just experience.
Main Uses SQ.shutdown() to shutdown the SQ thread it sets the SQ.m_shutdown flag sends the SQ thread a boost thread interrupt joins on the thread
SQ Thread This thread is running under it's thread_main() and __happens__ to be in the is_shutdown() method, after it checked the interruption_point() but BEFORE it checks the m_shutdown() logic. SQ.m_shudown was just changed to true, so is_shutdown() sees that and throws a boost::thread_interrupted exception. However, it should also be noted that the boost thread data has its internal interrupt requested flag set too.
Now SQ Thread, "throws" out of thread_main() is caught by the base classes operator() method and enters the SQ.thread_shutdown method.
SQ.thread_shutdown needs to shut down it's lookup child thread. So it invokes the LU.shutdown method.
The LU.shutdown method is just like above: sets the LU.m_shutdown flag sends the LU thread a boost thread interrupt joins on the LU thread
However, join() is an interruption point, and I haven't "cleared the interrupt" for the boost thread yet. Therefore, when the SQ thread invokes join(), it checks if there are any outstanding interrupts it needs to honor. There are, so it throws another thread interrupted exception, which exits the join(), and I don't catch; therefore the SQ thread exits prematurely due to my faulty logic.
Yes, that matches my understanding.
One final word. I feel ungrateful for bringing this up, and I'm still looking into this. However, I've encountered a new symptom. I've added the boost::this_thread::disable_interruption object to the beginning of my thread_base_c::shutdown() function and I've removed all references to m_shutdown -- as you recommended.
However now, I occasionally get a deadlock during the shutdown. I call it a deadlock when the shutdown process stalls for longer than 5 minutes. I've run the test multiple times and I'll see the deadlock on rare occasions. I've run seven different test cycles with the deadlock occurring at different times for each: 189, 398, 797, 999, 1282, 1527, 3416 (not in that order)
This could be an artifact of my simulation, and I'm just now starting to crawl through the gdb output -- nicely enough I can connect to the running process and see it's current running state.
Do you have any debugging insights or tips that I should apply for this investigation?
My first thought is to check that the code is not blocked in a non-interruptible call. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++11 thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976