Re: [boost] [Boost-users] Probably bug in condition::timed_wait()

Pavel Syomin wrote:
This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
The older documentation http://web.archive.org/web/20041024220516/boost.org/libs/thread/doc/conditio... does have a few sentences that mention spurious wakeups; I've no idea how they got lost. I'll CC: the dev list.

Peter Dimov wrote:
Pavel Syomin wrote:
This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
The older documentation
http://web.archive.org/web/20041024220516/boost.org/libs/thread/doc/conditio...
does have a few sentences that mention spurious wakeups; I've no idea how they got lost. I'll CC: the dev list.
The docu might tell something not only about spurious wakeups but also about the fact that return of timedout status doesn't preclude consumption of a signal issued concurrently (note that thread cancel delivery does preclude it). Well, the best fix is simply incorporate by reference the standard semantics as defined in (most current edition) ISO/IEC 9945. regards, alexander.

Alexander Terekhov wrote:
Peter Dimov wrote:
Pavel Syomin wrote:
This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
The older documentation
http://web.archive.org/web/20041024220516/boost.org/libs/thread/doc/conditio...
does have a few sentences that mention spurious wakeups; I've no idea how they got lost. I'll CC: the dev list.
The docu might tell something not only about spurious wakeups but also about the fact that return of timedout status doesn't preclude consumption of a signal issued concurrently (note that thread cancel delivery does preclude it).
Hey! Does this mean that timed_wait( lock, xt, pred ) has a bug? Instead of while (!pred()) { if (!timed_wait(lock, xt)) return false; } return true; it needs to be while (!pred()) { if (!timed_wait(lock, xt)) return pred(); } return true; otherwise a 'false' return can consume a signal meant for someone else without notifying the caller to handle the pred() == true case? I probably have the same subtle bug in at least a few places...

Peter Dimov wrote:
Does this mean that timed_wait( lock, xt, pred ) has a bug? Instead of
while (!pred()) { if (!timed_wait(lock, xt)) return false; } return true;
it needs to be
while (!pred()) { if (!timed_wait(lock, xt)) return pred(); } return true;
otherwise a 'false' return can consume a signal meant for someone else without notifying the caller to handle the pred() == true case?
The documentation states: Returns: false if xt is reached, otherwise true. So if it returned pred instead you wouldn't get the correct result. I.e. when using the timed_wait you are deliberately missing the signal when it arrives out of time. It simply does not matter if the signal arrived after timeout. Or did I misunderstand your concerns? Roland

Roland Schwarz wrote:
Peter Dimov wrote:
Does this mean that timed_wait( lock, xt, pred ) has a bug? Instead of
while (!pred()) { if (!timed_wait(lock, xt)) return false; } return true;
it needs to be
while (!pred()) { if (!timed_wait(lock, xt)) return pred(); } return true;
otherwise a 'false' return can consume a signal meant for someone else without notifying the caller to handle the pred() == true case?
The documentation states:
Returns: false if xt is reached, otherwise true.
So if it returned pred instead you wouldn't get the correct result. I.e. when using the timed_wait you are deliberately missing the signal when it arrives out of time. It simply does not matter if the signal arrived after timeout.
It does matter. Consider the case where two threads are awaiting an element to be pushed into a queue, one uses timed_wait, the other uses wait. The "element pushed" signal arrives concurrently with the timeout, the timed_wait thread is awakened, sees timeout, returns false to its caller, who consequently doesn't pop the element. The other thread waits forever, and the element remains in the queue.

Peter Dimov wrote:
It does matter. Consider the case where two threads are awaiting an element to be pushed into a queue, one uses timed_wait, the other uses wait. The "element pushed" signal arrives concurrently with the timeout, the timed_wait thread is awakened, sees timeout, returns false to its caller, who consequently doesn't pop the element. The other thread waits forever, and the element remains in the queue.
I can see the problem, but you simply would get the wrong result from timed_wait, making the caller believe, the operation hasn't timed out. I think the caller should do something like: if (!timed_wait(lk, xt, pred)) { // handle timeout case if (pred()) { // we got the signal after timeout ... } else { // we received a pure timeout } } else { // signal received in time } or the following which is equivalent to your suggestion: if (!timed_wait(lk, xt, pred) && !pred()) { // handle timeout } The current definition makes it possible for the user code to disambiguate for both cases, while your suggestion would drop the timeout information. Does this make sense? However the issue should be cleary documented. Roland

Roland Schwarz wrote:
Peter Dimov wrote:
It does matter. Consider the case where two threads are awaiting an element to be pushed into a queue, one uses timed_wait, the other uses wait. The "element pushed" signal arrives concurrently with the timeout, the timed_wait thread is awakened, sees timeout, returns false to its caller, who consequently doesn't pop the element. The other thread waits forever, and the element remains in the queue.
I can see the problem, but you simply would get the wrong result from timed_wait, making the caller believe, the operation hasn't timed out.
Well... if you use the wrong specification you'd get the wrong result, sure. The caller can't depend on the operation not being timed out when the return value is true, there is a race between the signal and the timeout and it is perfectly possible for timed_wait to return true after the timeout.
I think the caller should do something like:
if (!timed_wait(lk, xt, pred)) { // handle timeout case if (pred()) { // we got the signal after timeout ... } else { // we received a pure timeout } } else { // signal received in time
}
He can, but this kind of defeats the purpose of using the predicate version in the first place. It is supposed to guard against spurious wakeups, so it makes sense to also make it guard against stolen wakeups.
The current definition makes it possible for the user code to disambiguate for both cases, while your suggestion would drop the timeout information.
User code that tries to disambiguate is broken. You can't depend on a timeout not occuring since it's an asynchronous event that you have no way to stop. You can, however, depend on a return value of 'true' to tell you that the predicate is true, with the change.

Peter Dimov wrote:
Well... if you use the wrong specification you'd get the wrong result, sure. Excuse me, I do not understand this sentence.
The caller can't depend on the operation not being timed out when the return value is true, there is a race between the signal and the timeout and it is perfectly possible for timed_wait to return true after the timeout.
Of course it is possible. But what if the signal was sent in response to an external event, which was expected to arrive in time? The program logic then just is wrong because it might be too late to react to the signal. This is the only place where timed versions really make sense IMHO. If you just want be able to cancel the operation I would use the untimed wait and embed a cancel flag into the pred. ... <snip>example showing disambiguating</snip> ...
He can, but this kind of defeats the purpose of using the predicate version in the first place. It is supposed to guard against spurious wakeups, so it makes sense to also make it guard against stolen wakeups.
I agree we could change semantics this way, but I still cannot understand why you think the wakeup is stolen? You have access to it by evaluating pred.
User code that tries to disambiguate is broken.
Excuse me, I do not understand this one. To me code where one thread unconitionally waits while another timed_waits on the same condition looks even more broken. (But this is just taste, not sound argument.)
You can't depend on a timeout not occuring since it's an asynchronous event that you have no way to stop.
I am really not sure if I get your point. Let me try to come up with an example: Say we have a thread that makes a call to start an external device e.g. a motor at time t0. Say we have a second thread that is monitoring the RPM of the motor and signalling a condition if it has reached its nominal value. The first thread meanwhile is waiting on this condition with a timed wait say t0+t_emergency. Now lets assume motor specs. dictate it is essential to immediately stop the device if it hasn't reached nominal RPM within t_emergency. Doesn't your proposed change just hide this fact, and the motor thus might break? I think the question is which information is of greater importance to the program: expired timeout or missed signal. But again, I think I possibly cannot see your point clear enough yet. Can we agree that the current version is not wrong? You are just suggesting a possibly better less error prone semantics? If so please try to elaborate it a little more. Roland

Roland Schwarz wrote:
He can, but this kind of defeats the purpose of using the predicate version in the first place. It is supposed to guard against spurious wakeups, so it makes sense to also make it guard against stolen wakeups.
I agree we could change semantics this way, but I still cannot understand why you think the wakeup is stolen? You have access to it by evaluating pred.
"Spurious wakeup" means that a thread is awakened without reason. "Stolen wakeup" means that a thread consumes a wakeup intended for another thread. When a timeout expires _and_ the thread consumes a signal, this signal in a perfect world (where only one of the two events occur at a given moment) would have been delivered to another waiting thread. Hence, "stolen wakeup". You have two threads waiting on a condition, two events happen at the same time, a signal and the first thread timing out. Delivering both to the first thread is error prone. Most client code that uses timed waits does not handle this situation.
Say we have a thread that makes a call to start an external device e.g. a motor at time t0. Say we have a second thread that is monitoring the RPM of the motor and signalling a condition if it has reached its nominal value. The first thread meanwhile is waiting on this condition with a timed wait say t0+t_emergency. Now lets assume motor specs. dictate it is essential to immediately stop the device if it hasn't reached nominal RPM within t_emergency. Doesn't your proposed change just hide this fact, and the motor thus might break?
Why would it hide it?
But again, I think I possibly cannot see your point clear enough yet. Can we agree that the current version is not wrong?
It does what the specification says it does, so it is not "wrong" in this sense. I'm arguing that the specification needs to be changed.
You are just suggesting a possibly better less error prone semantics? If so please try to elaborate it a little more.
The idea of supplying a predicate version is to guard against common user mistakes. For the ordinary wait, the predicate version ignores spurious wakeups. It is very easy to assume that spurious wakeups don't occur and write incorrect code based on this assumption. So we supply a version where they don't. For timed_wait, there is an additional source of subtle bugs, the fact that a timeout and a signal can both occur and be "delivered" to the same waiting thread. You can't express this in a simple true/false return value, so you have to map the three possible outcomes into two (the fourth outcome is a spurious wakeup and is never returned). I argue that the predicate version should attempt to guard against this additional incorrect user assumption (that only one of a signal occuring or a timeout elapsing happens and not both), too. To get back to you example, if (!timed_wait(lk, xt, pred) && !pred()) { // handle timeout } have you ever seen such code? I haven't. if (!timed_wait(lk, xt, pred) ) { // handle timeout } is the norm (and pred is usually something lambda-ish). The low-level non-predicate version is available for those that are interested in the "raw" return value from the wait.

Peter Dimov wrote:
It does what the specification says it does, so it is not "wrong" in this sense. I'm arguing that the specification needs to be changed. ... <snipped> ... The idea of supplying a predicate version is to guard against common user mistakes. For the ordinary wait, the predicate version ignores spurious wakeups. It is very easy to assume that spurious wakeups don't occur and write incorrect code based on this assumption. So we supply a version where they don't.
For timed_wait, there is an additional source of subtle bugs, the fact that a timeout and a signal can both occur and be "delivered" to the same waiting thread. You can't express this in a simple true/false return value, so you have to map the three possible outcomes into two (the fourth outcome is a spurious wakeup and is never returned). I argue that the predicate version should attempt to guard against this additional incorrect user assumption (that only one of a signal occuring or a timeout elapsing happens and not both), too.
To get back to you example,
if (!timed_wait(lk, xt, pred) && !pred()) { // handle timeout }
have you ever seen such code? I haven't.
It doesn't really matter if one of us has seen such code.
if (!timed_wait(lk, xt, pred) ) { // handle timeout }
is the norm (and pred is usually something lambda-ish).
I have no further objections to your proposed change. And I will put this for the next release, as I am uncomfortably to introduce this change in semantics for RC_1_34_0. Also I would like to see yet other opinions on this matter. Do you agree? If yes, it would be very nice if you could file a feature request into source-forge, so that the issue doesn't get lost.
The low-level non-predicate version is available for those that are interested in the "raw" return value from the wait.
I agree, the wary user still has access to all information by using the lower level primitives. Roland

Peter Dimov wrote:
Alexander Terekhov wrote:
Peter Dimov wrote:
Pavel Syomin wrote:
This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
The older documentation
http://web.archive.org/web/20041024220516/boost.org/libs/thread/doc/conditio...
does have a few sentences that mention spurious wakeups; I've no idea how they got lost. I'll CC: the dev list.
The docu might tell something not only about spurious wakeups but also about the fact that return of timedout status doesn't preclude consumption of a signal issued concurrently (note that thread cancel delivery does preclude it).
Hey!
Does this mean that timed_wait( lock, xt, pred ) has a bug? Instead of
while (!pred()) { if (!timed_wait(lock, xt)) return false; } return true;
it needs to be
while (!pred()) { if (!timed_wait(lock, xt)) return pred(); } return true;
otherwise a 'false' return can consume a signal meant for someone else without notifying the caller to handle the pred() == true case?
Yup. Another possible fix is to broadcast() (signal() is not enough). while (!pred()) { if (!timed_wait(lock, xt)) return broadcast(), false; } Roland will like that. ;-) regards, alexander.

Alexander Terekhov wrote:
Another possible fix is to broadcast() (signal() is not enough).
while (!pred()) { if (!timed_wait(lock, xt)) return broadcast(), false; }
Roland will like that. ;-)
You got me. I am an advocate in favour of spurious wakeups, in specific if they serve no other purpose than making the implementation adhere to the documentation ;-) Roland

Alexander Terekhov wrote:
Peter Dimov wrote:
Alexander Terekhov wrote:
Peter Dimov wrote:
Pavel Syomin wrote:
This is solve my problems, but why there is nothing about spurious wakeup in boost documentation?...
The older documentation
http://web.archive.org/web/20041024220516/boost.org/libs/thread/doc/conditio...
does have a few sentences that mention spurious wakeups; I've no idea how they got lost. I'll CC: the dev list.
The docu might tell something not only about spurious wakeups but also about the fact that return of timedout status doesn't preclude consumption of a signal issued concurrently (note that thread cancel delivery does preclude it).
Hey!
Does this mean that timed_wait( lock, xt, pred ) has a bug? Instead of
while (!pred()) { if (!timed_wait(lock, xt)) return false; } return true;
it needs to be
while (!pred()) { if (!timed_wait(lock, xt)) return pred(); } return true;
otherwise a 'false' return can consume a signal meant for someone else without notifying the caller to handle the pred() == true case?
Yup.
Another possible fix is to broadcast() (signal() is not enough).
while (!pred()) { if (!timed_wait(lock, xt)) return broadcast(), false; }
Ummmm.... why is signal() not enough?

Roland Schwarz wrote:
Peter Dimov wrote:
Ummmm.... why is signal() not enough?
I suspect, because the condition variable could be shared by multiple predicates?
Hmmmm. A timed out waiter can only steal a wakeup if the condition variable has been signal()ed. If it were broadcast()ed, there's no problem, as everyone is woken up and there's nothing to steal. Therefore, a signal() ought to be enough to return the potentially stolen wakeup.

Peter Dimov wrote:
A timed out waiter can only steal a wakeup if the condition variable has been signal()ed. If it were broadcast()ed, there's no problem, as everyone is woken up and there's nothing to steal.
Sure, but only if all threads are waiting for the same predicate. The ones, that wait on a different see this wakeup as a spurious wakeup. From Butenhof: "There's nothing wrong with doing either (share one condition with multiple predicates or several conditions for a single predicate), as long as you're careful ... First, when you share a condition variable between multiple predicates, you must always broadcast, never signal." Since in our templated convenience wrapper we cannot know how much predicates indeed are associated, we must play safe, and broadcast. Roland

Peter Dimov wrote: [...]
Another possible fix is to broadcast() (signal() is not enough).
while (!pred()) { if (!timed_wait(lock, xt)) return broadcast(), false; }
Ummmm.... why is signal() not enough?
Threads A and B waiting (A does timed wait), C changes something, signals and enters wait expecting another state change followed by signaling from either A or B (whichever gets the signal and makes processing). If A times out and only re-signal, that signal can be consumed by C as spurious wake. So broadcast() is needed to wake B and let it do the job. regards, alexander.

Alexander Terekhov wrote:
Peter Dimov wrote: [...]
Another possible fix is to broadcast() (signal() is not enough).
while (!pred()) { if (!timed_wait(lock, xt)) return broadcast(), false; }
Ummmm.... why is signal() not enough?
Threads A and B waiting (A does timed wait), C changes something, signals and enters wait expecting another state change followed by signaling from either A or B (whichever gets the signal and makes processing). If A times out and only re-signal, that signal can be consumed by C as spurious wake. So broadcast() is needed to wake B and let it do the job.
Terrific. Thanks Alexander.

Alexander Terekhov wrote:
Threads A and B waiting (A does timed wait), C changes something, signals and enters wait expecting another state change followed by signaling from either A or B (whichever gets the signal and makes processing). If A times out and only re-signal, that signal can be consumed by C as spurious wake. So broadcast() is needed to wake B and let it do the job.
This can only happen if C is waiting on the same condition variable, isn't it? Then I would expect it is waiting on the inverted predicate to be meaningful (e.g. consumer/producer). So this is just a special case of a condition sharing multiple predicates, right? Roland

Roland Schwarz wrote:
Alexander Terekhov wrote:
Threads A and B waiting (A does timed wait), C changes something, signals and enters wait expecting another state change followed by signaling from either A or B (whichever gets the signal and makes processing). If A times out and only re-signal, that signal can be consumed by C as spurious wake. So broadcast() is needed to wake B and let it do the job.
This can only happen if C is waiting on the same condition variable, isn't it? Then I would expect it is waiting on the inverted predicate
Yeah, sort of inverted.
to be meaningful (e.g. consumer/producer). So this is just a special case of a condition sharing multiple predicates, right?
Right. regards, alexander.
participants (3)
-
Alexander Terekhov
-
Peter Dimov
-
Roland Schwarz