Re: [boost] [thread] Request review of new synchronisation object, boost::permit<>

5 May 2014

      On 5 May 2014 at 17:26, Peter Dimov wrote:
...
"You will never see spurious wakeups from a permit object -- [...]. This 
means you don't need to do any wait predicate checking with permits..."
I don't think that this is true. Suppose we have two producer threads Tp1 
and Tp2 and two consumer treads Tc1 and Tc2, and a predicate P. I'll treat 
the predicate as a simple boolean variable below, for simplicity.
A producer thread sets P=1 and calls notify_one. A consumer thread calls 
wait and then sets P=0. (It has to set P=0 regardless of how many producers 
have set the predicate to 1, because - I assume - the permit object doesn't 
count the pending notify_one calls, unlike a semaphore and like an event.)
So you have
Initially P is 0.
Tc1, Tc2 call wait.
Tp1 sets P=1 and calls notify_one. Tc1 is unblocked, returns from wait, but 
is immediately suspended by the scheduler for unfathomable scheduling 
reasons, like its code causing a page fault.
Tp2 sets P=1 and calls notify_one. Tc2 is unblocked, returns from wait, sets 
P=0.
Tc1 is resumed, sees P=0, panics.
I suspect that your guarantees, that grant/notify_one calls are blocking and 
mutually exclusive, were designed to prevent this, but they don't and can't, 
in theory. They _can_ make the above extremely improbable, but they aren't 
theoretically sound.
That isn't the reason actually, but I'll get back to that.

Firstly, permits are a notification object, not a serialisation 
object. If you have some predicate P whose state you are changing you 
must protect it with a mutex, just like any other code. This is why 
permit.wait() takes a locked mutex.

Reworking your example thusly:

Initially P is 0 and p_mutex is unlocked.
Tc1, Tc2 lock p_mutex and call wait.
Tp1 locks p_mutex, sets P=1 and calls notify_one. Upon Tp1 releasing 
the mutex, Tc1 is unblocked with mutex locked, returns from wait,
but is immediately suspended by the scheduler for unfathomable 
scheduling
reasons, like its code causing a page fault. Tp2 tries to set P=1, 
but gets blocked on the mutex being held by Tc1. When Tc1 is resumed 
and eventually unlocks the mutex, everything else proceeds as normal.

Maybe you meant the fact that the permit contains its own predicate, 
and hence no need for predicate checking? If so, then P equals the 
state of the permit, and:

Initially P is 0.
Tc1, Tc2 call wait.
Tp1 sets P=1. Tc1 is unblocked, thus atomically resetting P=0, 
returns from wait, but is immediately suspended by the scheduler for 
unfathomable scheduling reasons, like its code causing a page fault.
Tp2 sets P=1. Tc2 is unblocked, thus atomically resetting P=0, 
returns from wait.
Tc1 is resumed, all is well. Exactly the number of waiters were freed 
as granters.

Perhaps though in fact you were more concerned about two threads 
granting a permit, but only one thread getting woken? The problem 
here is that in some cases this is exactly what you want, in other 
cases it is a lost wakeup and you should use a semaphore instead. 
Similarly, revoking non-consuming permits is always racy unless you 
have added synchronisation to ensure you're not being stupid.

All threading primitives have their gotchas, no doubt. The question 
here is if the presented implementation of this permit is a wise 
addition to Boost.
...
I suspect that your guarantees, that grant/notify_one calls are blocking and
mutually exclusive, were designed to prevent this, but they don't and can't,
in theory. They _can_ make the above extremely improbable, but they aren't
theoretically sound.
Grants being blocking for non-consuming permits is actually a time 
complexity guarantee, so you are being guaranteed progress no matter 
what. The Windows kernel can "cheat" here in ways we cannot in 
portable code.
...
The specification of condition variables allows spurious wakeups not
because they can't be prevented by the implementation; it permits spurious
wakeups to make client code assume spurious wakeups, because code that is
written to assume spurious wakeups is more likely to be correct. Or
conversely, code that does not assume spurious wakeups is likely to be
incorrect even if no spurious wakeups occur.
I would have said spurious wakeups come exclusively from kernel bugs 
and the fact POSIX allows signals to escape syscalls, which is 
definitely a pre-threading era legacy design choice. On Windows 
spurious wakeups are trapped for you in user space so you never see 
them and therefore never have to deal with them, unless of course you 
elect to do so (e.g. any of the wait APIs able to return 
WAIT_IO_COMPLETION).

Spurious wakeups absolutely can be prevented by the implementation, 
just for backwards compatibility POSIX cannot do so.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] [thread] Request review of new synchronisation object, boost::permit<>

Niall Douglas