Probably bug in condition::timed_wait()
Hi, I'm porting custom library from UNIX to Windows and found that some piece of code work correctly on UNIX and some time segfaulting on Windows. I think, that problem is on boost::condition::timed_wait() function. To be sure I write small test: #include <stdexcept> #include <iostream> #include <boost/thread.hpp> #include <windows.h> using namespace boost; using namespace std; class AsyncQueue { public: AsyncQueue() : m_counter(0) {} void push() { mutex::scoped_lock lock(m_mutex); m_counter++; m_condition.notify_one(); } void timed_pop(const xtime &time) { mutex::scoped_lock lock(m_mutex); if(m_counter) { m_counter--; return; } if(!m_condition.timed_wait(lock, time)) return; if(!m_counter) throw out_of_range("oops..."); m_counter--; } private: mutex m_mutex; condition m_condition; int m_counter; }; static AsyncQueue queue; static void start() { for(;;) { queue.push(); Sleep(100); } } int main(int argc, char **argv) { thread thread(start); xtime time; for(;;) { xtime_get(&time, TIME_UTC); time.nsec += 10000; queue.timed_pop(time); } return 0; } AsyncQueue is a simple asynchronous queue container with two operations - push() and timed_pop(). Because this is test, AsyncQueue don't really collect something, but it's has variable (m_counter field) that count current number on elements in container. When push() called m_counter incremented and when timed_pop() "virtually" remove item from queue m_counter decremented. In main() I create additional thread, which push data to queue. Data popped from queue in main thread by calling timed_pop(). But sometimes timed_pop throw out_of_range exception. If you look in code you can see that it's mean that condition::timed_wait() return true, but queue is empty! I look to boost-users list and found e-mail's about similar problems from Eric Colleu, but decision wasn't found. Two words about my environment. I use Visual Studio 2005, Windows XP SP2 on VMWare 5.5.1. Boost version is 1.33.1. Thanks for constructive answers.
Pavel Syomin wrote:
Hi,
I'm porting custom library from UNIX to Windows and found that some piece of code work correctly on UNIX and some time segfaulting on Windows. I think, that problem is on boost::condition::timed_wait() function. To be sure I write small test:
[...] You are assuming that when condition::(timed_)wait returns the predicate (in your case m_counter != 0) is guaranteed to be true. This is not correct. condition::wait is allowed to return at any time (this is known as a "spurious wakeup".) You need to use a while loop. while( m_counter == 0 ) { if( !m_condition.timed_wait( lock, time ) ) return; // timeout } assert( m_counter != 0 ); --m_counter; I'm almost 100% certain that this was in the documentation once, but I can't find it now. Maybe it somehow got lost in the transition to the new doc layout.
Peter Dimov wrote:
You are assuming that when condition::(timed_)wait returns the predicate (in your case m_counter != 0) is guaranteed to be true. This is not correct. condition::wait is allowed to return at any time (this is known as a "spurious wakeup".) You need to use a while loop.
while( m_counter == 0 ) { if( !m_condition.timed_wait( lock, time ) ) return; // timeout }
assert( m_counter != 0 ); --m_counter;
I'm almost 100% certain that this was in the documentation once, but I can't find it now. Maybe it somehow got lost in the transition to the new doc layout.
Thank you! This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
Pavel Syomin wrote:
This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
The older documentation http://web.archive.org/web/20041024220516/boost.org/libs/thread/doc/conditio... does have a few sentences that mention spurious wakeups; I've no idea how they got lost. I'll CC: the dev list.
Peter Dimov wrote:
Pavel Syomin wrote:
Hi,
I'm porting custom library from UNIX to Windows and found that some piece of code work correctly on UNIX and some time segfaulting on Windows. I think, that problem is on boost::condition::timed_wait() function. To be sure I write small test:
[...]
You are assuming that when condition::(timed_)wait returns the predicate (in your case m_counter != 0) is guaranteed to be true. This is not correct. condition::wait is allowed to return at any time (this is known as a "spurious wakeup".) You need to use a while loop.
while( m_counter == 0 ) { if( !m_condition.timed_wait( lock, time ) ) return; // timeout }
Alternatively, you can avoid the need to explicitly use a while loop by calling the overloaded version of timed_wait() that takes a predicate argument. This version does the loop for you internally. E.g.: if (!(m_condition.timed_wait (lock, time, boost::bind<int> (boost::mem_fn (&AsyncQueue::m_counter), this)))) return; On a separate point, there is a bug in your usage of the xtime struct in main(): xtime_get(&time, TIME_UTC); time.nsec += 10000; queue.timed_pop(time); You cannot simply add a value to time.nsec and then use the struct, because nsec is required to be in the range [0,999999999]. [See <http://www.cl.cam.ac.uk/~mgk25/time/c/>.] By adding a value to nsec, you may cause it to overflow, which would invalidate the struct, and possibly cause unexpected behavior. You need to normalize the xtime struct by "carrying over" any excess part of nsec to sec. E.g.: #define NSECS_PER_SEC 1000000000 ... xtime_get(&time, TIME_UTC); time.nsec += 10000; if (time.nsec >= NSECS_PER_SEC) { time.sec += time.nsec / NSECS_PER_SEC; time.nsec = time.nsec % NSECS_PER_SEC; } queue.timed_pop(time); (This is an ugly but necessary consequence of Boost.Thread's unfortunate design decision to use a raw struct to represent time values. It would have been much better to use a class that encapsulates its internals, and that knows how to normalize itself.) Moshe
Hi!
Alternatively, you can avoid the need to explicitly use a while loop by calling the overloaded version of timed_wait() that takes a predicate argument. This version does the loop for you internally. E.g.:
if (!(m_condition.timed_wait (lock, time, boost::bind<int> (boost::mem_fn (&AsyncQueue::m_counter), this)))) return;
Thank you! I saw overloaded version of timed_wait() but didn't understand its purpose.
On a separate point, there is a bug in your usage of the xtime struct in main():
xtime_get(&time, TIME_UTC); time.nsec += 10000; queue.timed_pop(time);
You cannot simply add a value to time.nsec and then use the struct, because nsec is required to be in the range [0,999999999]. In really code I use sec, not nsec. But you are rigth.
participants (3)
-
Moshe Matitya
-
Pavel Syomin
-
Peter Dimov