New subject: [Interprocess] deadlocking race condition in emulation interprocess_condition.hpp

12 Sep 2009

      This appears related to the discussion found here:
http://lists.boost.org/boost-users/2009/03/46051.php

Bug:

There is a set of conditions where a process can manage to enter
do_timed_wait, increment m_num_waiters, and exit without decrementing
it.

Boost 1.39.0

Sequence of events:

We join our hero, Process A (P_A), in
boost/interprocess/sync/emulation/interprocess_condition.hpp.

P_A is executing a do_timed_wait(true, lock, abs_time) call, and is
spinning at the while loop at line 124.
tout_enabled == true, and abs_time is a microsecond in the future (about
to expire but hasn't yet).
Process B, P_A's trusty sidekick, sends a notify_all on the conditional,
breaking P_A out of the while loop at line 124.
abs_time arrives (ie, P_A got to line 149 with
microsec_clock::universal_time() >= abs_time and timed_out = false).
With these conditions, P_A gets to line 163 and calls the constructor
for scoped_lock.
P_A jumps to boost/interprocess/sync/scoped_lock.hpp line 114.
P_A executes mp_mutex->timed_lock(abs_time) at line 115.
P_A jumps to
boost/interprocess/sync/emulation/interprocess_condition.hpp line 49.
P_A takes a reading of now at line 56.
P_A finds that (now >= abs_time) at line 58 and is sent packing with a
return value of false.
P_A arrives back in
boost/interprocess/sync/emulation/interprocess_condition.hpp on line
163.
P_A gets to line 171 and finds lock is false.  He panics!  He sets
timed_out to true and unlock_enter_mut to true, but in his haste to
break out of evil Dr. while(1)'s clutches, he forgot to atomically
decrement m_num_waiters!
Manic laughter can be heard behind him as he tries in vein to acquire
the lock on line 214.
"You fool! You fell into my trap!", shouts Dr. while(1).  "Process B
grabbed that very lock and attempted to free you again!  He is at line
56 of this very header file, waiting for a call from you that will never
come, and he's holding your precious lock!  Your deadlock is complete!
HAHAHAHAHAHAH!!"

Ahem.

I do not know the proper way to fix this problem.

My straightforward idea is to add the following line at line 172 of
boost/interprocess/sync/emulation/interprocess_condition.hpp:

         if(!lock){

detail::atomic_dec32(const_cast<boost::uint32_t*>(&m_num_waiters));
            timed_out = true;
            unlock_enter_mut = true;
            break;
         }

However, the commment inlcuded with the block in question:

//Notification occurred, we will lock the checking interprocess_mutex so
that
//if a notify_one notification occurs, only one thread can exit

...makes me wary.

One thing seems certain to me: there's something not quite right here.
I could increase my timeout window to lower the chances of this
occurring, but that is not a solution.  That is masking a problem.

Anyone got a take on this?  Am I missing something obvious?  That whole
block with the comment really makes me suspicious.

[Interprocess] deadlocking race condition in emulation interprocess_condition.hpp

Young, Zachariah L

Ion Gaztañaga

Young, Zachariah L

Young, Zachariah L

Ion Gaztañaga

Young, Zachariah L

Ion Gaztañaga

tags

participants (2)