Re: [boost] [Thread] read_write_mutex bug?

2 Sep 2005

      John Maddock schrieb:
...
I can also reduce the number of threads to about 10 and still get the 
deadlock.
You can go as low as two.
...
However, I can't see what the problem is:  when the deadlock occurs all the 
threads are waiting for the writer condition variable (m_waiting_writers) to 
wake up one of the writers at 
boost::detail::thread::read_write_mutex_impl<boost::mutex>::do_write_lock() 
Line 512.  The member m_waking_writers is set to one, and as far as I can 
see that can only occur in 
read_write_mutex_impl<Mutex>::do_wake_writer(void) line 1425, which then 
must have notified the condition variable to wake up one thread.  m_state 
must have been set to zero before all this happens so the woken thread 
should not loop and go back to sleep  (Footnote, actually that appears not 
to be true, sometimes a thread is woken with m_state == -1 but that appears 
not to be the immediate cause of the problem).  So.. I'm stumped at present.
When a thread is releasing its lock, the waiters on the condition 
m_waiting_writers
are notified_one. The m_state is set to 0, and m_num_waking_writers > 0.
Now when it happens (and it does happen) that another thread enters the
do_write_lock _before_ any other thread has been woken up, it will see
an m_state of 0. And this is bad, since there are m_num_waking_writers > 0.
This is bad because obtaining the lock (which will be granted because
of m_state == 0) in essence is kind of a wakeup. But the code does not
account for this and correct the m_num_waking_writers.
Hence the do_wake_write will never again try to notify_one any waiters.
This leads to deadlock.

What is left: Who actually is beeing woken up then? Yup obviously
the original waiting writer receives a spurious wakeup, sees that
the m_state is -1 again, and keeps on waiting.

My already posted bugfix solves for this, but I am not yet sure what is 
about
the other do_*_lock operations. Are they susceptible to this bug too?

Regards,
Roland

Re: [boost] [Thread] read_write_mutex bug?

Roland Schwarz