Deadlock issue with boost::condition

I'm having a deadlock problem with boost::condition variables and I'm not sure if I'm miss using condition variables or if condition variables aren't the right tool for the job. The basic problem appears to be that the thread calling notify_all (lets call it thread1) calls notify_all before the thread calling wait (lets call it thread2) calls wait. The result is thread2 waits forever for the notify that thread1 already sent. To further complicate the matter thread1 can't call notify_all again until it has been signaled to run (via a different condition variable) which is signaled by thread2. Does this make sense? Here's some pseudo code for what I'm doing. Thread1() //Worker thread { while(1) { DoSomeStuff(); condition1.notify_all() scoped_lock l(mutex2); condition2.wait(l); } } Thread2() //Master thread. { while(1) { scoped_lock l(mutex1); condition1.wait(l) DoSomeOtherStuff(); condition2.notify_all(); } } StartMulitThreadedTask() { startThread( thread2 ); Sleep( 2000 ) startThread( thread1 ); } Here is a little time line of what I think is happening thread2 locks mutex1 thread2 calls wait on condition1 thread1 calls DoSomeStuff thread1 calls notify_all on condition1 thread1 locks mutex2 thread1 calls wait on condition2 thread2 calls DoSomeOtherStuff thread2 calls notify_all on condition2 thread1 calls DoSomeStuff thread1 calls notify_all on condition1 thread1 locks mutex2 thread1 calls wait on condition1 thread2 locks mutex1 thread2 calls wait on condition1 DEADLOCK OCCURS I've done this sort of thing with plain old binary mutexes or single count semaphores and I can go back to that but it really seemed like boost::conditions simplified things a bit and should be right choice except for this one problem. Is there something I can do to make boost::conditions work in this case? Should I be doing additional or different locking of mutex1 and mutex2? do I need yet another mutex? Thanks Matt S.

Hi Matt,
To further complicate the matter thread1 can't call notify_all again until it has been signaled to run (via a different condition variable) which is signaled by thread2.
Does this make sense? Here's some pseudo code for what I'm doing.
Thread2() //Master thread. { while(1) { scoped_lock l(mutex1); condition1.wait(l)
This is wrong. Your code should be written like this: bool worker_thread_finished_working = false; .... scoped_lock l(mutex1); while(!worker_thread_finished_working) condition1.wait(l); That way, if worker thread has finished working before you enter wait, you simply won't enter the loop and won't call condition1.wait. Call to 'notify_all' on condition should be considered as meaning "something in the world has changed", and after waking from 'wait' you should reevaluate necessary variables to decide if the wait is done or not. Using 'notify_all' in any other way is risky. Some further notes can be found at: http://vladimir_prus.blogspot.com/2005/07/spurious-wakeups.html - Volodya

Hi, actually I've discovered that isn't even right the work thread needs to lock the mutex before it sets the worker_thread_finished_working flag to true. Plus given boost bool worker_thread_finished = false; Thread1() //Worker thread { while(1) { DoSomeStuff(); { scoped_lock l( mutex1 ) worker_thread_finished = true; condition1.notify_all() } scoped_lock l(mutex2); condition2.wait(l); } } Thread2() //Master thread. { while(1) { scoped_lock l(mutex1); while( !worker_thread_finished ) condition1.wait(l) worker_thread_finished = false; DoSomeOtherStuff(); condition2.notify_all(); } } You can also change the boolean flag to a predicate and use the form of wait that takes a predicate and implements the while loop for you. The key is the worker thread must hold mutex1 while it changes the state flag/predicate, otherwise notify could be called between the check in the while loop and call to wait and the deadlock would occur. I'm kind of bothered that the documentation doesn't state this. This whole issue make me think that this problem could be much more easily solved with with a binary semaphore or a non recursive mutex. Then it doesn't matter if the worker thread gets done before the master thread starts to wait. Thanks Matt S. Vladimir Prus wrote:
Hi Matt,
To further complicate the matter thread1 can't call notify_all again until it has been signaled to run (via a different condition variable) which is signaled by thread2.
Does this make sense? Here's some pseudo code for what I'm doing.
Thread2() //Master thread. { while(1) { scoped_lock l(mutex1); condition1.wait(l)
This is wrong. Your code should be written like this:
bool worker_thread_finished_working = false;
.... scoped_lock l(mutex1); while(!worker_thread_finished_working) condition1.wait(l);
That way, if worker thread has finished working before you enter wait, you simply won't enter the loop and won't call condition1.wait. Call to 'notify_all' on condition should be considered as meaning "something in the world has changed", and after waking from 'wait' you should reevaluate necessary variables to decide if the wait is done or not. Using 'notify_all' in any other way is risky.
Some further notes can be found at: http://vladimir_prus.blogspot.com/2005/07/spurious-wakeups.html
- Volodya

Matt Schuckmann wrote:
Hi, actually I've discovered that isn't even right the work thread needs to lock the mutex before it sets the worker_thread_finished_working flag to true.
Sure, all accesses to shared variables must be protected by a mutex.
You can also change the boolean flag to a predicate and use the form of wait that takes a predicate and implements the while loop for you.
The key is the worker thread must hold mutex1 while it changes the state flag/predicate, otherwise notify could be called between the check in the while loop and call to wait and the deadlock would occur.
I'm kind of bothered that the documentation doesn't state this.
Yes, the documentation is a bit lean on this topic. When I was reading it for the first time I was completely confused.
This whole issue make me think that this problem could be much more easily solved with with a binary semaphore or a non recursive mutex. Then it doesn't matter if the worker thread gets done before the master thread starts to wait.
Yes, for a pair of threads, and your use case, semaphore could work. But I think condition variables scale much better. With binary semaphore you must be very carefull to up the semaphore exactly when the waiting thread needs to wake. Since condition.notify_all just means "something has changed" and the woken thread decides itself if the wait condition is satisfied, it can be easier to get right. You just call notify_all whenever anything changes. - Volodya

This whole issue make me think that this problem could be much more easily solved with with a binary semaphore or a non recursive mutex. Then it doesn't matter if the worker thread gets done before the master thread starts to wait. IMHO the model you are describing suits Win32 events a lot more than condition variables. The issue is around "setting the condition prior to somebody waiting on it", which, by definition, works well with events. IE an event would remain signalled, until such time when a thread decides to block on it. At that time, the thread just checks
Hello Matt, Matt Schuckmann wrote on 3/02/2006 at 6:27 a.m.: the event and continues running. Having said all that, the fact remains - win32 events don't exist on unix/linux. However, events and the corresponding multiplexor (WaitForMultipleObjects()) can be implemented using mutexes and condition variables, thus making the the above model easy to program. Best regards, Oleg.
participants (3)
-
Matt Schuckmann
-
Oleg Smolsky
-
Vladimir Prus