
On 05/12/2012 09.16, Anthony Williams wrote:
On 04/12/12 18:32, Gaetano Mendola wrote:
Hi all, I was investigating a rare deadlock when issuing an interrupt and a timed_join in parallel. I come out with the the following code showing the behavior.
The deadlock is rare so sometime you need to wait a bit.
I couldn't try it with boost 1.52 because the code is invalid due the precondition of "thread joinable" when issuing the timed_join.
That's a hint.
Is the code not valid or a real bug?
The code is invalid: you keep trying to interrupt and join even after the thread has been joined! Once the thread has been joined, the thread handle is no longer valid, and you should exit the loop.
I haven't seen this statement in the documentation. The loop was meant to exploit exactly this, then you are confirming that interrupting a joined thread is not valid. How do I safely interrupt then a thread? There is no "atomic" check_joinable_then_interrupt, whatching at the interrupt code it seems that the check is done inside. I'm lost. In order to cope with a bug in 1.40 (an interrupt to a thread could have been lost) I have implemented my own ThreadGroup: ThreadGroup::interrupt_all() { for_each_thread( boost::thread::interrupt(); if ( boost::thread::timed_join() ) { move_to_next_thread } ) } along with the fact that boost::thread_group doesn't provide a method "join_any" with the semantic to issue an interrupt_all if any of the threads terminate I have implemented join_any this way: ThreadGroup::join_any() { while(true) { for_each_thread( if ( boost::thread::timed_join() ) { interrupt_all(); } else { move_to_next_thread } ) } } This has working well for 2 years now. Upgrading to 1.48 I'm experiencing dead locks and core dumps. The backtrace shows that a timed_join crashes if somehow the thread terminates at the same time. Given the fact in the 1.48 documentation there is nothing written about the fact I can not call a timed_join concurrently with the interrupt and the fact there is specified no precondition on the interrupt method I did suppose the above code should have been armless using the 1.48. I can try to remove the code issuing an interrupt until the timed_join doesn't exit successfully, thrusting that after an interrupt a boost::thread exits if is or reach an interruption point but I'm not quite convinced that this will solve for sure the deadlocks/crashes. I will remove the "redundant" code from my ThreadGroup and I will run my regression tests, I'll be back as soon I have some hints. Regards Gaetano Mendola