[Interprocess] deadlocking race condition in emulation interprocess_condition.hpp
This appears related to the discussion found here: http://lists.boost.org/boost-users/2009/03/46051.php Bug: There is a set of conditions where a process can manage to enter do_timed_wait, increment m_num_waiters, and exit without decrementing it. Boost 1.39.0 Sequence of events: We join our hero, Process A (P_A), in boost/interprocess/sync/emulation/interprocess_condition.hpp. P_A is executing a do_timed_wait(true, lock, abs_time) call, and is spinning at the while loop at line 124. tout_enabled == true, and abs_time is a microsecond in the future (about to expire but hasn't yet). Process B, P_A's trusty sidekick, sends a notify_all on the conditional, breaking P_A out of the while loop at line 124. abs_time arrives (ie, P_A got to line 149 with microsec_clock::universal_time() >= abs_time and timed_out = false). With these conditions, P_A gets to line 163 and calls the constructor for scoped_lock. P_A jumps to boost/interprocess/sync/scoped_lock.hpp line 114. P_A executes mp_mutex->timed_lock(abs_time) at line 115. P_A jumps to boost/interprocess/sync/emulation/interprocess_condition.hpp line 49. P_A takes a reading of now at line 56. P_A finds that (now >= abs_time) at line 58 and is sent packing with a return value of false. P_A arrives back in boost/interprocess/sync/emulation/interprocess_condition.hpp on line 163. P_A gets to line 171 and finds lock is false. He panics! He sets timed_out to true and unlock_enter_mut to true, but in his haste to break out of evil Dr. while(1)'s clutches, he forgot to atomically decrement m_num_waiters! Manic laughter can be heard behind him as he tries in vein to acquire the lock on line 214. "You fool! You fell into my trap!", shouts Dr. while(1). "Process B grabbed that very lock and attempted to free you again! He is at line 56 of this very header file, waiting for a call from you that will never come, and he's holding your precious lock! Your deadlock is complete! HAHAHAHAHAHAH!!" Ahem. I do not know the proper way to fix this problem. My straightforward idea is to add the following line at line 172 of boost/interprocess/sync/emulation/interprocess_condition.hpp: if(!lock){ detail::atomic_dec32(const_castboost::uint32_t*(&m_num_waiters)); timed_out = true; unlock_enter_mut = true; break; } However, the commment inlcuded with the block in question: //Notification occurred, we will lock the checking interprocess_mutex so that //if a notify_one notification occurs, only one thread can exit ...makes me wary. One thing seems certain to me: there's something not quite right here. I could increase my timeout window to lower the chances of this occurring, but that is not a solution. That is masking a problem. Anyone got a take on this? Am I missing something obvious? That whole block with the comment really makes me suspicious.
Young, Zachariah L escribió:
My straightforward idea is to add the following line at line 172 of boost/interprocess/sync/emulation/interprocess_condition.hpp:
if(!lock){
detail::atomic_dec32(const_castboost::uint32_t*(&m_num_waiters)); timed_out = true; unlock_enter_mut = true; break; }
I think you are right. The count should be decremented because this thread will exit the wait logic . thanks, Ion
Hello Ion, I have been looking at this code for awhile now and I am convinced that my original suggestion is not adequate. I am now getting deadlock with 4 waiters doing timed_waits on a tight timeout (now + 8ms) while using a single notify_all. I am wondering why abs_time is taken into consideration at all in the else clause, since the if clause is meant to handle the timeout case. I believe that the following code: //Notification occurred, we will lock the checking interprocess_mutex so that //if a notify_one notification occurs, only one thread can exit //--------------------------------------------------------------- InternalLock lock; if(tout_enabled){ InternalLock dummy(m_check_mut, abs_time); lock = boost::interprocess::move(dummy); } else{ InternalLock dummy(m_check_mut); lock = boost::interprocess::move(dummy); } if(!lock){ detail::atomic_dec32(const_castboost::uint32_t*(&m_num_waiters)); timed_out = true; unlock_enter_mut = true; break; } //--------------------------------------------------------------- Should be changed to this: //Notification occurred, we will lock the checking interprocess_mutex so that //if a notify_one notification occurs, only one thread can exit //--------------------------------------------------------------- InternalLock lock(m_check_mut); //--------------------------------------------------------------- I do not see the point of the abs_time consideration or the lock check. Am I mistaken? Am I missing something? -Zach -----Original Message----- From: Ion Gaztañaga [mailto:igaztanaga@gmail.com] Sent: Wednesday, September 16, 2009 5:39 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Interprocess] deadlocking race conditionin emulation interprocess_condition.hpp Young, Zachariah L escribió:
My straightforward idea is to add the following line at line 172 of boost/interprocess/sync/emulation/interprocess_condition.hpp:
if(!lock){
detail::atomic_dec32(const_castboost::uint32_t*(&m_num_waiters)); timed_out = true; unlock_enter_mut = true; break; }
I think you are right. The count should be decremented because this thread will exit the wait logic . thanks, Ion _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Actually, why lock the m_check_mut at all? Why not delete that whole thing and remove the m_check_mut from the class entirely, as it apparently exists only to synchronize the get and set of m_command on line 177 (before any edits), which is an atomic compare-and-set (ie, doesn't need synchronization)? -Zach -----Original Message----- From: Young, Zachariah L Sent: Wednesday, September 16, 2009 8:06 PM To: 'boost-users@lists.boost.org' Subject: RE: [Boost-users] [Interprocess] deadlocking race condition in emulation interprocess_condition.hpp Hello Ion, I have been looking at this code for awhile now and I am convinced that my original suggestion is not adequate. I am now getting deadlock with 4 waiters doing timed_waits on a tight timeout (now + 8ms) while using a single notify_all. I am wondering why abs_time is taken into consideration at all in the else clause, since the if clause is meant to handle the timeout case. I believe that the following code: //Notification occurred, we will lock the checking interprocess_mutex so that //if a notify_one notification occurs, only one thread can exit //--------------------------------------------------------------- InternalLock lock; if(tout_enabled){ InternalLock dummy(m_check_mut, abs_time); lock = boost::interprocess::move(dummy); } else{ InternalLock dummy(m_check_mut); lock = boost::interprocess::move(dummy); } if(!lock){ detail::atomic_dec32(const_castboost::uint32_t*(&m_num_waiters)); timed_out = true; unlock_enter_mut = true; break; } //--------------------------------------------------------------- Should be changed to this: //Notification occurred, we will lock the checking interprocess_mutex so that //if a notify_one notification occurs, only one thread can exit //--------------------------------------------------------------- InternalLock lock(m_check_mut); //--------------------------------------------------------------- I do not see the point of the abs_time consideration or the lock check. Am I mistaken? Am I missing something? -Zach -----Original Message----- From: Ion Gaztañaga [mailto:igaztanaga@gmail.com] Sent: Wednesday, September 16, 2009 5:39 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Interprocess] deadlocking race conditionin emulation interprocess_condition.hpp Young, Zachariah L escribió:
My straightforward idea is to add the following line at line 172 of boost/interprocess/sync/emulation/interprocess_condition.hpp:
if(!lock){
detail::atomic_dec32(const_castboost::uint32_t*(&m_num_waiters)); timed_out = true; unlock_enter_mut = true; break; }
I think you are right. The count should be decremented because this thread will exit the wait logic . thanks, Ion _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Young, Zachariah L escribió:
Actually, why lock the m_check_mut at all?
Why not delete that whole thing and remove the m_check_mut from the class entirely, as it apparently exists only to synchronize the get and set of m_command on line 177 (before any edits), which is an atomic compare-and-set (ie, doesn't need synchronization)?
-Zach
I reach the same conclusion just a minute ago. Could you test your code removing this mutex? Best, Ion
I altered the code from this: else{ //Notification occurred, we will lock the checking interprocess_mutex so that //if a notify_one notification occurs, only one thread can exit //--------------------------------------------------------------- InternalLock lock; if(tout_enabled){ InternalLock dummy(m_check_mut, abs_time); lock = boost::interprocess::move(dummy); } else{ InternalLock dummy(m_check_mut); lock = boost::interprocess::move(dummy); } if(!lock){ timed_out = true; unlock_enter_mut = true; break; } //--------------------------------------------------------------- boost::uint32_t result = detail::atomic_cas32 (const_castboost::uint32_t*(&m_command), SLEEP, NOTIFY_ONE); to this: else{ boost::uint32_t result = detail::atomic_cas32 (const_castboost::uint32_t*(&m_command), SLEEP, NOTIFY_ONE); and I have had no further problems in my tests. -Zach -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Ion Gaztañaga Sent: Wednesday, September 16, 2009 11:33 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Interprocess] deadlocking race conditionin emulation interprocess_condition.hpp Young, Zachariah L escribió:
Actually, why lock the m_check_mut at all?
Why not delete that whole thing and remove the m_check_mut from the class entirely, as it apparently exists only to synchronize the get and set of m_command on line 177 (before any edits), which is an atomic compare-and-set (ie, doesn't need synchronization)?
-Zach
I reach the same conclusion just a minute ago. Could you test your code removing this mutex? Best, Ion _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Ion Gaztañaga
-
Young, Zachariah L