[threads] API Design Question

Hello Why is it, that the wait and timed_wait functions of the condition_variable class do not encapsulate the problem of spurious wakeups? I wonder whether there are good reasons. I have seen libraries which do so. Do they miss a corner case? For illustration here is a sketch of a wrapper class ---8<--- class save_condition_variable { public: save_condition_variable() : waiting(0), wakeup(0) { } void notify_one() { if (wakeup < waiting) { ++wakeup; } cond.notify_one(); } void notify_all() { wakeup = waiting; cond.notify_all(); } void wait(unique_lock<mutex>& lock) { ++waiting; do { cond.wait(lock) } while (wakeup==0); --waiting; --wakeup; } bool timed_wait(unique_lock<mutex>& lock, system_time const& abs_time) { ++waiting; bool res = true; while(wakeup==0 and res) { res = cond.timed_wait(lock, abs_time); } --waiting; if (res) --wakeup; return res; } private: condition_variable cond; int waiting; int wakeup; }; --->8--- Having the spurious wakeup problem handled by the library is a significant increase of comfort for both developers and maintainers. regards Alex

AMDG Alexander Bernauer wrote:
Why is it, that the wait and timed_wait functions of the condition_variable class do not encapsulate the problem of spurious wakeups?
I wonder whether there are good reasons. I have seen libraries which do so. Do they miss a corner case?
For one thing, the code that you posted requires the lock to be held when calling notify. Also, it's better in general to use the overloads of wait that take a predicate, since these do provide protection from spurious wakeups. In Christ, Steven Watanabe

Alexander Bernauer wrote:
Why is it, that the wait and timed_wait functions of the condition_variable class do not encapsulate the problem of spurious wakeups?
Hi Alexander, Practically all of my use of conditions looks something like this: lock l(mut); while (temperature < threshold) { cond.wait(l); } Or the equivalent using the predicate version of wait. This sort of usage tolerates spurious wakeups. Furthermore, my "temperature < threshold" test is as cheap as a spuriousness test would be, so adding a spuriousness test would not even make it faster even if spurious wakeups were frequent. I'm curious to know what sort of use you are making of condition variables where hiding spurious wakeups is important. Are trying to announce events via the condition without any additional state? Regards, Phil.

Alexander Bernauer <alex@copton.net> writes:
Why is it, that the wait and timed_wait functions of the condition_variable class do not encapsulate the problem of spurious wakeups?
Avoiding spurious wake-ups complicates the design considerably, and was the cause of a bug in the win32 implementation of boost::condition_variable.
I wonder whether there are good reasons. I have seen libraries which do so. Do they miss a corner case?
There are various complications with condition variables. It is likely that an implementation that avoids spurious wake-ups either imposes additional serialization on calls to notify_xxx/wait (e.g. through the use of an additional mutex in the cond var) or misses a corner case. When you get a spurious wake, just check the associated condition, and wait again if it is not met. The predicated overload of wait() will do this for you. Anthony -- Author of C++ Concurrency in Action | http://www.manning.com/williams just::thread C++0x thread library | http://www.stdthread.co.uk Just Software Solutions Ltd | http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Hi all, thanks for your replies. I wasn't aware of the fact that it is not required to have the mutex acquired when calling notify_*. As you stated right, a different thread API, which encapsulates spurious wakeups, would demand the mutex to be aquired for call to notify_*. AFAICS, in order to trigger the notification one has to acquire the lock anyway, because the waiting thread loops around some value of a state, which one has to alter concurrently. So, I can see no additional penalty when encapsulating spurious wakeups. Furthermore, afaics, the sketched implementation of such an API is safe, isn't it? Does anybody know an example of a corner case which doesn't work? I am aware of the wait functions, which accept a predicate and do the loop for you. My point is, that expressing the loop condition is not always trivial. Therefore in those cases - at least up to my experience - normaly new flags or counters are introduced, which signal "Yes, it was really intended that you woke up". This is cumbersome. On the other hand those flags/counters are a safe way to solve the spurious wakeup problem in general. So why not having the library doing the job? Not to mention the fact that spurious wakeups are a very hard to detect bug in case one forgets to loop or - even worse - there is a bug in the loop condition. regards Alexander

Alexander Bernauer wrote:
I wasn't aware of the fact that it is not required to have the mutex acquired when calling notify_*.
I believe it is preferred to not have the lock held because you can then have fewer context switches: if you notify with the lock held the waiting thread may be scheduled but it will immediately block trying to re-acquire the mutex, and the first thread will be scheduled again to release it. I haven't measured the effect this though. Is anyone aware of a correctness reason for not notifying with the lock held?
AFAICS, in order to trigger the notification one has to acquire the lock anyway, because the waiting thread loops around some value of a state, which one has to alter concurrently.
Yes, i.e. lock mutex change state unlock mutex notify condition
So, I can see no additional penalty when encapsulating spurious wakeups.
You have the penalty of actually doing the spuriousness check and storing the extra state, as you described in your first post.
I am aware of the wait functions, which accept a predicate and do the loop for you. My point is, that expressing the loop condition is not always trivial. Therefore in those cases - at least up to my experience - normaly new flags or counters are introduced, which signal "Yes, it was really intended that you woke up". This is cumbersome.
Can you give an example? If your condition is very slow to evaluate and spurious wakeups are common then you may save time by eliminating the possibility of spuriousness before evaluating the condition. But my experience is that spurious wakeups are rare in practice. (The exception might be if your application makes a lot of use of signals, e.g. async IO or timers.)
On the other hand those flags/counters are a safe way to solve the spurious wakeup problem in general. So why not having the library doing the job?
Well, it makes things slower for users whose conditions are simple or who have infrequent spurious wakeups.
Not to mention the fact that spurious wakeups are a very hard to detect bug in case one forgets to loop
Every instance of waiting on a condition should always be in a loop, or use the predicate version of wait.
or - even worse - there is a bug in the loop condition.
Regards, Phil.

----- Original Message ----- From: "Phil Endecott" <spam_from_boost_dev@chezphil.org> To: <boost@lists.boost.org> Sent: Thursday, February 05, 2009 3:25 PM Subject: Re: [boost] [threads] API Design Question
Not to mention the fact that spurious wakeups are a very hard to detect bug in case one forgets to loop
Every instance of waiting on a condition should always be in a loop, or use the predicate version of wait.
Hi, I was wondering if there are other specific cases where we need to call wait without testing a condition in a loop. Can someone show them? Vicente

On Thursday 05 February 2009 15:25:50 Phil Endecott wrote:
if you notify with the lock held the waiting thread may be scheduled but it will immediately block trying to re-acquire the mutex, and the first thread will be scheduled again to release it.
I checked the posix documentation and indeed it is not guaranteed that the above does not happen. I didn't expect that.
I haven't measured the effect this though.
Maybe some implementations are smart enough to avoid this scenario. At least I hope so.
Can you give an example?
In my case, I have two event queues. One for immediate events and one for delayed ones, i.e. something like std::queue<Event*> imQ; std::priority_queue<std::pair<system_time, Event*> > delQ; Both queues are encapsulated by a "meta"-queue class providing void push(Event*, time_duration delay=not_a_date_time) and Event* pop() Besides normal queue semantics the meta queue has the additional constraint that a delayed event may not be returned by pop() before its delivery time has been reached. Without spurious wakeups, we would have ---8<--- void push(Event* event, time_duration delay=not_a_date_time) { lock l(mutex); if (delay == not_a_date_time) { imQ.push(event); } else { delQ.push(std::make_pair(get_system_time() + delay, event)); } condition.notify(); } --->8--- and ---8<--- Event* pop() { lock l(mutex); while(1) { if (not imQ.emtpy()) { Event* event = imQ.top(); imQ.pop(); return event; } else { if (not delQ.empty()) { const system_time now = get_system_time(); if (delQ.top().first < now) { Event* event = delQ.top().second; delQ.pop(); return event; } else { condition.timed_wait(l, delQ.top().first); // (1) } } else { condition.wait(l); // (2) } } } } --->8--- While writing those lines I realized that the above code is correct even if there are spurious wakeups. My point was that (1) becomes ---8<--- while (imQ.empty() and get_system_time() < delQ.top().first and condition.timed_wait(l, delQ.top().first)); --->8--- and (2) becomes ---8<--- while(imQ.empty() and delQ.empty()) { condition.wait(); } --->8--- but this is not necessary. Maybe the above is a general pattern to avoid complicated loop conditions for spurious wakeups. If this is true, besides convenience, I have no argument left for encapsulating spurious wakeups. Thank you regards Alexander

----- Original Message ----- From: "Alexander Bernauer" <alex@copton.net> To: <boost@lists.boost.org> Sent: Thursday, February 05, 2009 12:56 PM Subject: Re: [boost] [threads] API Design Question
On the other hand those flags/counters are a safe way to solve the spurious wakeup problem in general. So why not having the library doing the job? Not to mention the fact that spurious wakeups are a very hard to detect bug in case one forgets to loop or - even worse - there is a bug in the loop condition.
Hi, could you explain how your design helps to debug a bug in the loop condition? Vicente

Hi On Thursday 05 February 2009 20:26:32 vicente.botet wrote:
could you explain how your design helps to debug a bug in the loop condition?
In my design the loop against spurious wakeups is implemented once in the library. This can be reviewed and tested until its correct. In the boost design everybody has to implement its own loops. So my design does not help debugging. Instead it helps avoiding a whole class of hard to find bugs. regards Alex
participants (5)
-
Alexander Bernauer
-
Anthony Williams
-
Phil Endecott
-
Steven Watanabe
-
vicente.botet