[thread] Mysterious deadlock

I have this simple testcase (below) which runs into deadlock (full
run trace after the testcase):
...
[Producer] Producer generated data 28
[Producer] push(): about to lock
[Producer] push(): locked
[Producer] push(): about to unlock
[Producer] Producer generated data 29
[Producer] push(): about to lock
[Producer] push(): locked
[Consumer] Signaled for data
[Consumer] empty(): about to lock
<<DEADLOCK>>
The call to empty() by Consumer is after the first time
it is woken up from full_condition. So the deadlock
occurs right when the consumer first attempts to grab
some data.
Is there something I'm not understanding about how
mutexes and locks work in Boost.Thread? I feel like
I'm missing something obvious. This comes out of a
larger and more complicated program but I was able to
isolate the mysterious (to me) behavior I'm seeing.
I used to use LOG for debugging but I thought there might be
some interaction between the queue mutex locks and the print
mutex locks so I switched to the non-locking LOG_NOLCK but
observed no change in behavior.
Any help is greatly appreciated.
-Dave
---------------------------------
Testcase
#include <iostream>
#include <queue>
#include <map>
#include <string>
#include

I haven't looked at the code very closely, but it looks like there's a
potential for deadlock in the Consumer's wait_for_data and push methods.
Here is the scenario (in hypothetical deadlock time order.)
1. consumer thread in wait_for_data(): acquires full_mutex, calls empty()
2. producer thread in push(): called by Producer, acquires mutex
3. consumer thread in wait_for_data()'s call to empty(): attempts to acquire
mutex, blocks on producer thread
4. producer thread in push(): attempts to acquire full_mutex, blocks on
consumer thread
5. neither thread can advance
As a general rule, interleaved acquistions like this of two or more
resources is fraught with peril. If you want an unbounded queue (as it
appears in your code) it can be done with a single mutex and condition
variable.
Andrew
On 2/9/06, David Greene
I have this simple testcase (below) which runs into deadlock (full run trace after the testcase):
... [Producer] Producer generated data 28 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 29 [Producer] push(): about to lock [Producer] push(): locked [Consumer] Signaled for data [Consumer] empty(): about to lock <<DEADLOCK>>
The call to empty() by Consumer is after the first time it is woken up from full_condition. So the deadlock occurs right when the consumer first attempts to grab some data.
Is there something I'm not understanding about how mutexes and locks work in Boost.Thread? I feel like I'm missing something obvious. This comes out of a larger and more complicated program but I was able to isolate the mysterious (to me) behavior I'm seeing.
I used to use LOG for debugging but I thought there might be some interaction between the queue mutex locks and the print mutex locks so I switched to the non-locking LOG_NOLCK but observed no change in behavior.
Any help is greatly appreciated.
-Dave
--------------------------------- Testcase
#include <iostream> #include <queue> #include <map> #include <string>
#include
#include #include #include #include
typedef boost::recursive_mutex mutex_type; typedef mutex_type::scoped_lock lock_type;
// Atomic prints mutex_type print_mutex; mutex_type print_map_mutex;
// For print/debug purposes typedef std::map
thread_map_type; thread_map_type thread_map; template<typename T> void register_thread_id(const T &entity) { lock_type lock(print_map_mutex); std::cout << "Registering " << pthread_self() << " to " << entity.name () << std::endl; thread_map[pthread_self()] = entity.name(); }
#define LOG(msg) \ { \ lock_type lock(print_mutex); \ std::cout << "[" << thread_map[pthread_self()] \ << "] " << msg << std::endl; \ }
#define LOG_NOLCK(msg) \ { \ std::cout << "[" << thread_map[pthread_self()] \ << "] " << msg << std::endl; \ }
// Synchronize start (make sure all registered) mutex_type go_mutex; boost::condition go_condition;
bool ready = false;
class Consumer { private: boost::thread thread;
std::queue<int> queue;
typedef boost::recursive_mutex mutex_type; typedef mutex_type::scoped_lock lock_type;
mutable mutex_type mutex; mutable mutex_type full_mutex;
mutable boost::condition full_condition;
void wait_until_ready(void) const { ::lock_type lock(go_mutex);
while(!ready) { go_condition.wait(lock); } LOG_NOLCK("go!"); };
void wait_for_data(void) const { lock_type lock(full_mutex); while(empty()) { LOG_NOLCK("Waiting for data"); full_condition.wait(lock); LOG_NOLCK("Signaled for data"); } LOG_NOLCK("Wakeup for data"); };
void event_loop(void) { register_thread_id(*this); wait_until_ready();
while(true) { wait_for_data(); LOG_NOLCK("Consumer received data " << top()); pop(); } };
public: Consumer(void) : thread(boost::bind(&Consumer::event_loop, this)) {};
std::string name(void) const { return("Consumer"); };
void push(int val) { LOG_NOLCK("push(): about to lock"); { lock_type lock(mutex); LOG_NOLCK("push(): locked"); queue.push(val);
lock_type full_lock(full_mutex); LOG_NOLCK("push(): about to unlock"); } full_condition.notify_all(); };
int top(void) const { LOG_NOLCK("top(): about to lock"); lock_type lock(mutex); LOG_NOLCK("top(): locked"); int item = queue.front(); LOG_NOLCK("top(): about to unlock"); return(item); };
void pop(void) { LOG_NOLCK("pop(): about to lock"); lock_type lock(mutex); LOG_NOLCK("pop(): locked"); queue.pop(); LOG_NOLCK("pop(): about to unlock"); };
bool empty (void) const { LOG_NOLCK("empty(): about to lock"); lock_type lock(mutex); LOG_NOLCK("empty(): locked"); bool val = queue.empty(); LOG_NOLCK("empty(): about to unlock"); return(val); }; };
class Producer { private: boost::thread thread;
void wait_until_ready(void) const { lock_type lock(go_mutex);
while(!ready) { go_condition.wait(lock); } LOG_NOLCK("go!"); };
void event_loop(void) { int i = 0;
register_thread_id(*this); wait_until_ready();
while(true) { LOG_NOLCK("Producer generated data " << i); consumer->push(i); ++i; }; };
Consumer *consumer;
public: Producer(Consumer *c) : consumer(c), thread(boost::bind(&Producer::event_loop, this)) {};
std::string name(void) const { return("Producer"); }; };
void go(void) { // Wait until everyone arrives lock_type lock(go_mutex); ready = true;
// Notify everyone go_condition.notify_all(); }
int main(void) { Consumer c; Producer p(&c);
go();
while(true);
return(0); }
----------------- Ouput Trace
[Consumer] empty(): about to unlock [Consumer] Waiting for data Registering 48958384 to Producer [Producer] go! [Producer] Producer generated data 0 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 1 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 2 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 3 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 4 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 5 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 6 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 7 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 8 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 9 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 10 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 11 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 12 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 13 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 14 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 15 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 16 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 17 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 18 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 19 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 20 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 21 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 22 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 23 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 24 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 25 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 26 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 27 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 28 [Producer] push(): about to lock [Producer] push(): locked [Producer] push(): about to unlock [Producer] Producer generated data 29 [Producer] push(): about to lock [Producer] push(): locked [Consumer] Signaled for data [Consumer] empty(): about to lock <<DEADLOCK>> _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andrew Smith wrote:
I haven't looked at the code very closely, but it looks like there's a potential for deadlock in the Consumer's wait_for_data and push methods. Here is the scenario (in hypothetical deadlock time order.)
1. consumer thread in wait_for_data(): acquires full_mutex, calls empty() 2. producer thread in push(): called by Producer, acquires mutex 3. consumer thread in wait_for_data()'s call to empty(): attempts to acquire mutex, blocks on producer thread 4. producer thread in push(): attempts to acquire full_mutex, blocks on consumer thread 5. neither thread can advance
Oops! Yep you are correct. I believe that moving the acquire of full_mutex outside the inner scope fixes the problem because the producer doesn't hold two mutexes at the same time.
As a general rule, interleaved acquistions like this of two or more resources is fraught with peril. If you want an unbounded queue (as it appears in your code) it can be done with a single mutex and condition variable.
Yes, I see that now thanks to Delfin. The real application is trickier but I think I can apply the same principle. This is definitely better than what I had. -Dave

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of David Greene Sent: Thursday, February 09, 2006 1:46 PM To: boost-users@lists.boost.org Subject: [Boost-users] [thread] Mysterious deadlock
I have this simple testcase (below) which runs into deadlock <snip>
I don't understand very well why you have 2 mutexes inside Consumer. Anyway, I left only one mutex inside Consumer which I called m_mutex and I renamed the condition from full_condition to m_condition and changed The Consumer::push method to this: void push(int val) { LOG("push(): about to lock"); lock_type lock(m_mutex); LOG("push(): locked"); queue.push(val); m_condition.notify_all(); }; And all seems to work. I ran it for a little while and producer was generating data (ran it up to 33) and consumer was consuming data (about 17 when producer reached 33). The boost::thread library includes several examples of this producer/consumer pattern. Look at monitor.cpp and starvephil.cpp in libs/thread/example -delfin

Delfin Rojas wrote:
I don't understand very well why you have 2 mutexes inside Consumer. Anyway,
"mutex" is for access to the queue data structure. "full_mutex" is for signaling that the consumer is waiting on data. Without "full_mutex" the producer might send a notify that gets missed. Or am I thinking about this incorrectly?
I left only one mutex inside Consumer which I called m_mutex and I renamed the condition from full_condition to m_condition and changed The Consumer::push method to this:
void push(int val) { LOG("push(): about to lock"); lock_type lock(m_mutex); LOG("push(): locked"); queue.push(val); m_condition.notify_all(); };
And all seems to work. I ran it for a little while and producer was generating data (ran it up to 33) and consumer was consuming data (about 17 when producer reached 33).
Ok, I think you're right. This should work. As I noted in another message, mocing the notify to the same scope as the lock protecting the condition was the key.
The boost::thread library includes several examples of this producer/consumer pattern. Look at monitor.cpp and starvephil.cpp in libs/thread/example
Boost.Thread has monitors? Why isn't this documented? -Dave

David Greene wrote:
void push(int val) { LOG_NOLCK("push(): about to lock"); { lock_type lock(mutex); LOG_NOLCK("push(): locked"); queue.push(val);
lock_type full_lock(full_mutex); LOG_NOLCK("push(): about to unlock"); } full_condition.notify_all(); };
Whoops! This is a race condition. Because the full_lock is released, there's no guaratee the consumer is waiting on the condition and the signal may be lost. When I moved the lock after the inner brace, everything started working. -Dave
participants (4)
-
Andrew Smith
-
David A. Greene
-
David Greene
-
Delfin Rojas