On 5 May 2014 at 17:26, Peter Dimov wrote:
"You will never see spurious wakeups from a permit object -- [...]. This means you don't need to do any wait predicate checking with permits..."
I don't think that this is true. Suppose we have two producer threads Tp1 and Tp2 and two consumer treads Tc1 and Tc2, and a predicate P. I'll treat the predicate as a simple boolean variable below, for simplicity.
A producer thread sets P=1 and calls notify_one. A consumer thread calls wait and then sets P=0. (It has to set P=0 regardless of how many producers have set the predicate to 1, because - I assume - the permit object doesn't count the pending notify_one calls, unlike a semaphore and like an event.)
So you have
Initially P is 0. Tc1, Tc2 call wait. Tp1 sets P=1 and calls notify_one. Tc1 is unblocked, returns from wait, but is immediately suspended by the scheduler for unfathomable scheduling reasons, like its code causing a page fault. Tp2 sets P=1 and calls notify_one. Tc2 is unblocked, returns from wait, sets P=0. Tc1 is resumed, sees P=0, panics.
I suspect that your guarantees, that grant/notify_one calls are blocking and mutually exclusive, were designed to prevent this, but they don't and can't, in theory. They _can_ make the above extremely improbable, but they aren't theoretically sound.
That isn't the reason actually, but I'll get back to that. Firstly, permits are a notification object, not a serialisation object. If you have some predicate P whose state you are changing you must protect it with a mutex, just like any other code. This is why permit.wait() takes a locked mutex. Reworking your example thusly: Initially P is 0 and p_mutex is unlocked. Tc1, Tc2 lock p_mutex and call wait. Tp1 locks p_mutex, sets P=1 and calls notify_one. Upon Tp1 releasing the mutex, Tc1 is unblocked with mutex locked, returns from wait, but is immediately suspended by the scheduler for unfathomable scheduling reasons, like its code causing a page fault. Tp2 tries to set P=1, but gets blocked on the mutex being held by Tc1. When Tc1 is resumed and eventually unlocks the mutex, everything else proceeds as normal. Maybe you meant the fact that the permit contains its own predicate, and hence no need for predicate checking? If so, then P equals the state of the permit, and: Initially P is 0. Tc1, Tc2 call wait. Tp1 sets P=1. Tc1 is unblocked, thus atomically resetting P=0, returns from wait, but is immediately suspended by the scheduler for unfathomable scheduling reasons, like its code causing a page fault. Tp2 sets P=1. Tc2 is unblocked, thus atomically resetting P=0, returns from wait. Tc1 is resumed, all is well. Exactly the number of waiters were freed as granters. Perhaps though in fact you were more concerned about two threads granting a permit, but only one thread getting woken? The problem here is that in some cases this is exactly what you want, in other cases it is a lost wakeup and you should use a semaphore instead. Similarly, revoking non-consuming permits is always racy unless you have added synchronisation to ensure you're not being stupid. All threading primitives have their gotchas, no doubt. The question here is if the presented implementation of this permit is a wise addition to Boost.
I suspect that your guarantees, that grant/notify_one calls are blocking and mutually exclusive, were designed to prevent this, but they don't and can't, in theory. They _can_ make the above extremely improbable, but they aren't theoretically sound.
Grants being blocking for non-consuming permits is actually a time complexity guarantee, so you are being guaranteed progress no matter what. The Windows kernel can "cheat" here in ways we cannot in portable code.
The specification of condition variables allows spurious wakeups not because they can't be prevented by the implementation; it permits spurious wakeups to make client code assume spurious wakeups, because code that is written to assume spurious wakeups is more likely to be correct. Or conversely, code that does not assume spurious wakeups is likely to be incorrect even if no spurious wakeups occur.
I would have said spurious wakeups come exclusively from kernel bugs and the fact POSIX allows signals to escape syscalls, which is definitely a pre-threading era legacy design choice. On Windows spurious wakeups are trapped for you in user space so you never see them and therefore never have to deal with them, unless of course you elect to do so (e.g. any of the wait APIs able to return WAIT_IO_COMPLETION). Spurious wakeups absolutely can be prevented by the implementation, just for backwards compatibility POSIX cannot do so. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/