[thread] condition::notify with unlocked mutex

Hello. The Boost CV class forces that a mutex would be locked when calling wait(), but not when calling notify(). This seems a bit strange to me. As far as my understanding of CVs goes (and experience with them), locking when notifying is just as "makes sense" as locking when waiting. When waiting, we check the predicate, and when notifying, we change that predicate. And it is only obvious that all reads and writes to the shared-among-threads predicate should be locked. Shouldn't it? I might be just be barking at the wrong tree here, because I realize that Boost.Thread follows the POSIX standard to a great extent, and POSIX doesn't require locking when notifying. However, even if we want to provide this option, it seems to me that it's definitely the rare, rather than the common case. Providing an interface that will ensure for notifying, locking the *same* mutex as when waiting, will greatly reduce bugs (and as a side effect, I think it will also greatly improve CV performance on Windows). What I'm suggesting is that the condition constructor would accept a mutex reference and store it. The Lock& argument to the wait() method will be removed, as there's already a reference to the mutex. It might also be useful to templatize the condition class for Mutex type: template <class Mutex> class condition; although I'm not sure about that. Does all that make sense? Yuval

Yuval Ronen wrote:
Hello.
The Boost CV class forces that a mutex would be locked when calling wait(), but not when calling notify(). This seems a bit strange to me.
It isn't strange since notify is typically not called with the mutex locked. This would cause the awakened thread to immediately be put back to sleep to wait for the mutex.
As far as my understanding of CVs goes (and experience with them), locking when notifying is just as "makes sense" as locking when waiting. When waiting, we check the predicate, and when notifying, we change that predicate. And it is only obvious that all reads and writes to the shared-among-threads predicate should be locked. Shouldn't it?
It should.

Peter Dimov wrote:
Yuval Ronen wrote:
Hello.
The Boost CV class forces that a mutex would be locked when calling wait(), but not when calling notify(). This seems a bit strange to me.
It isn't strange since notify is typically not called with the mutex locked. This would cause the awakened thread to immediately be put back to sleep to wait for the mutex.
Yes, but what's the harm in that? The notify simply change the state of the thread from "waiting on a CV" to "waiting on a mutex". I'm no operating system expert, but I guess it can be done with a flip of a flag (or two) inside the kernel. After that we continue normally. Seems reasonable to me.
As far as my understanding of CVs goes (and experience with them), locking when notifying is just as "makes sense" as locking when waiting. When waiting, we check the predicate, and when notifying, we change that predicate. And it is only obvious that all reads and writes to the shared-among-threads predicate should be locked. Shouldn't it?
It should.
So I understand that it's your opinion that the benefits of having an interface that emphasizes locking both on wait and on notify - of the same mutex, as it should - aren't worth it?

Yuval Ronen wrote:
Peter Dimov wrote:
Yuval Ronen wrote:
Hello.
The Boost CV class forces that a mutex would be locked when calling wait(), but not when calling notify(). This seems a bit strange to me.
It isn't strange since notify is typically not called with the mutex locked. This would cause the awakened thread to immediately be put back to sleep to wait for the mutex.
Yes, but what's the harm in that? The notify simply change the state of the thread from "waiting on a CV" to "waiting on a mutex". I'm no operating system expert, but I guess it can be done with a flip of a flag (or two) inside the kernel. After that we continue normally.
It can be done in principle, but then you still need to make another kernel call in unlock to wake up the thread that you already tried to wake up in notify. If the unlock is done before the notify, it can take the fast path and not enter the kernel (unless there's someone else waiting on the mutex).

Peter Dimov wrote:
Yuval Ronen wrote:
Peter Dimov wrote:
Yuval Ronen wrote:
Hello.
The Boost CV class forces that a mutex would be locked when calling wait(), but not when calling notify(). This seems a bit strange to me. It isn't strange since notify is typically not called with the mutex locked. This would cause the awakened thread to immediately be put back to sleep to wait for the mutex. Yes, but what's the harm in that? The notify simply change the state of the thread from "waiting on a CV" to "waiting on a mutex". I'm no operating system expert, but I guess it can be done with a flip of a flag (or two) inside the kernel. After that we continue normally.
It can be done in principle, but then you still need to make another kernel call in unlock to wake up the thread that you already tried to wake up in notify. If the unlock is done before the notify, it can take the fast path and not enter the kernel (unless there's someone else waiting on the mutex).
Ok, while trying to understand this, I made a list of kernel calls in each scenario. Both scenarios contain two threads: Thread1 is waiting on a CV. Thread2 locks the mutex (as it should), changes the CV predicate, notifies the CV, and unlock the mutex, either before or after the notify. Lets see what Thread2 does: Scenario #1 ----------- 1. calls mutex_lock (kernel) 2. changes predicate 3. calls mutex_unlock (kernel) 4. calls cond_signal (kernel changes thread1 state from "blocked" to "ready" Scenario #2 ----------- 1. calls mutex_lock (kernel) 2. changes predicate 3. calls cond_signal (kernel changes thread1 state from "blocked on cond" to "blocked on mutex" 4. calls mutex_unlock (kernel changes thread1 state from "blocked" to "ready" Both scenarios contain exactly 3 kernel calls, and thread1 is made "ready" only after the 3rd call. No difference here. Another thing (in addition to the better chance of avoiding bugs, as I explained in previous post) is that I believe that Scenario #2 better serves the "fairness" ideal, as described by Anthony Williams in http://lists.boost.org/Archives/boost/2006/10/112609.php. I'm not saying that RW mutex and CV are the same thing, but it might be that same principals apply to both. Or maybe you disagree with what Anthony wrote? Ignore this paragraph if so... Yuval

Yuval Ronen wrote:
Ok, while trying to understand this, I made a list of kernel calls in each scenario. Both scenarios contain two threads: Thread1 is waiting on a CV. Thread2 locks the mutex (as it should), changes the CV predicate, notifies the CV, and unlock the mutex, either before or after the notify. Lets see what Thread2 does:
Scenario #1 ----------- 1. calls mutex_lock (kernel)
mutex_lock only goes to the kernel if the mutex isn't free.
2. changes predicate 3. calls mutex_unlock (kernel)
Likewise, mutex_unlock only goes to the kernel if between the lock and the unlock another thread has tried to lock the mutex and has been blocked by the kernel.
4. calls cond_signal (kernel changes thread1 state from "blocked" to "ready"
Scenario #2 ----------- 1. calls mutex_lock (kernel)
No difference here.
2. changes predicate 3. calls cond_signal (kernel changes thread1 state from "blocked on cond" to "blocked on mutex" 4. calls mutex_unlock (kernel changes thread1 state from "blocked" to "ready"
Guaranteed kernel call since at least one thread, thread1, is blocked waiting for the mutex.
Another thing (in addition to the better chance of avoiding bugs, as I explained in previous post) is that I believe that Scenario #2 better serves the "fairness" ideal, as described by Anthony Williams in http://lists.boost.org/Archives/boost/2006/10/112609.php.
You are free to use scenario #2 if you like. I'm just saying that it isn't the only option. :-)

Yuval Ronen <ronen_yuval@yahoo.com> writes:
Scenario #1 ----------- 1. calls mutex_lock (kernel) 2. changes predicate 3. calls mutex_unlock (kernel) 4. calls cond_signal (kernel changes thread1 state from "blocked" to "ready"
Scenario #2 ----------- 1. calls mutex_lock (kernel) 2. changes predicate 3. calls cond_signal (kernel changes thread1 state from "blocked on cond" to "blocked on mutex" 4. calls mutex_unlock (kernel changes thread1 state from "blocked" to "ready"
Both scenarios contain exactly 3 kernel calls, and thread1 is made "ready" only after the 3rd call. No difference here.
Another thing (in addition to the better chance of avoiding bugs, as I explained in previous post) is that I believe that Scenario #2 better serves the "fairness" ideal, as described by Anthony Williams in http://lists.boost.org/Archives/boost/2006/10/112609.php. I'm not saying that RW mutex and CV are the same thing, but it might be that same principals apply to both.
If you look at the code attached to the message of mine that you've referenced, you will see that a) the code that does the notify already holds the mutex, since it has just changed the predicate. b) the code doesn't necessarily release the mutex immediately upon notifying the condition variable. In some circumstances, two condition variables are notified (one signalled, one broadcast) before the mutex is unlocked. c) if we release the mutex first, the state might have changed, and we might have additional threads waiting on the condition variables. Holding the mutex whilst we notify the CVs helps keep everything easy to follow. Point b) is where the "fairness" comes in --- there might be threads waiting on two CVs that we want to compete for the mutex. By holding the mutex whilst we notify both the CVs, we ensure the fairness. If we release the mutex first, then threads waiting on the CV we notify first effectively get priority over those waiting on the second CV. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams <anthony_w.geo <at> yahoo.com> writes:
[...]
c) if we release the mutex first, the state might have changed, and we might have additional threads waiting on the condition variables. Holding the mutex whilst we notify the CVs helps keep everything easy to follow.
[...]
Anthony
If the mutex is not locked, other thread can lock the mutex before the thread being awakened by the condition variable. This might delay scheduling of the thread waiting on the condition variable. -- regards, Prashant

Yuval Ronen wrote:
... And it is only obvious that all reads and writes to the shared-among-threads predicate should be locked. Shouldn't it?
Of course, the predicate access needs to be locked. But there is no requirement for the signaling part, since this does not access the predicate.
... also greatly improve CV performance on Windows).
Do you mean: Requirement to hold the lock while signaling will improve speed ? If so I might tell you, that the intent of not requiring to hold the mutex while signaling is expected to improve speed. Butenhof (Programming with Posix Threads) says: "Although you must have the associated mutex locked to wait on a condition variable, you can signal (or broadcast) a condition variable with the associated mutex unlocked if that is more convenient. The advantage of doing so is that, on many systems, this may be more efficient. When a waiting thread awakens, it must first lock the mutex. If the thread awakens while a signaling thread holds the mutex, then the awakened thread must immediately block on the mutex - you've gone through two context switches to get back where you started. (There is an optimization, which I've called wait morphing, that moves a thread directly from the condition variable wait queue to the mutex wait queue in this case, without a context switch, when the mutex is locked. This optimization can produce a substantial performance benefit for many applications.) Weighing on the other side is the fact that, if the mutex is not locked, any thread (not only the one being awakened) can lock the mutex prior to the thread being awakened. This race is one source of intercepted wakeups. A lower-priority thread, for example, might lock the mutex while another thread was about to awaken a very high-priority thread, delaying scheduling of the high-priority thread. If the mutex remains locked while signaling, this cannot happen - the high-priority waiter will be placed before the lower-priority waiter onr the mutex and will be scheduled first." I might be wrong, but AFAIK "wait morphing" generally cannot be done in a user-space library, it needs to intimately interact with the scheduler.
Providing an interface that will ensure for notifying, locking the *same* mutex as when waiting, ... ... What I'm suggesting is that the condition constructor would accept a mutex reference and store it. The Lock& argument to the wait() method will be removed, as there's already a reference to the mutex. It might also be useful to templatize the condition class for Mutex type:
What would this be of help for? Changing a predicate needs locking the mutex, not signaling it. If you mean that the signal function internally first should lock the mutex, signal and then unlock it, this would make things even worse (And is quite different from the case Butenhof addresses in the above citation, since this will be of no help in the priority case.) Also putting the mutex into the condition ctor would make construction unnecessary complicated. E.g. how would you do in the following (quite common) case: class foo { mutex m; condition c; }; Btw.: I highly recommend reading Butenhofs book, it not only covers pthreads, but will help getting a better general understanding of threading too. Regards, Roland

Roland Schwarz wrote:
Yuval Ronen wrote:
... And it is only obvious that all reads and writes to the shared-among-threads predicate should be locked. Shouldn't it?
Of course, the predicate access needs to be locked. But there is no requirement for the signaling part, since this does not access the predicate.
... also greatly improve CV performance on Windows).
Do you mean: Requirement to hold the lock while signaling will improve speed ?
If so I might tell you, that the intent of not requiring to hold the mutex while signaling is expected to improve speed. Butenhof (Programming with Posix Threads) says:
"Although you must have the associated mutex locked to wait on a condition variable, you can signal (or broadcast) a condition variable with the associated mutex unlocked if that is more convenient. The advantage of doing so is that, on many systems, this may be more efficient. When a waiting thread awakens, it must first lock the mutex. If the thread awakens while a signaling thread holds the mutex, then the awakened thread must immediately block on the mutex - you've gone through two context switches to get back where you started. (There is an optimization, which I've called wait morphing, that moves a thread directly from the condition variable wait queue to the mutex wait queue in this case, without a context switch, when the mutex is locked. This optimization can produce a substantial performance benefit for many applications.) Weighing on the other side is the fact that, if the mutex is not locked, any thread (not only the one being awakened) can lock the mutex prior to the thread being awakened. This race is one source of intercepted wakeups. A lower-priority thread, for example, might lock the mutex while another thread was about to awaken a very high-priority thread, delaying scheduling of the high-priority thread. If the mutex remains locked while signaling, this cannot happen - the high-priority waiter will be placed before the lower-priority waiter onr the mutex and will be scheduled first."
I might be wrong, but AFAIK "wait morphing" generally cannot be done in a user-space library, it needs to intimately interact with the scheduler.
Well, I wouldn't dare argue with Butenhof, but: A. Even he says that with morphing, there is no difference. You mentioned that such morphing would require some OS support, and I agree. No doubt that synchronization primitives require OS support, and such support might not be available. Perhaps this point makes this discussion more appropriate for a POSIX forum, for example. I apologize if it has gone OT. Peter, in his post, claims that even with morphing there's a difference, because there's an optimization to avoid kernel call that can't be used when morphing is used in this case. Whether this optimization is a good enough reason to influence the CV interface is something that can be argued about. B. Butenhof is talking about a theoretical, ideal OS, which Windows is known to be not... But seriously, if notify() requires the mutex to be locked - the same mutex locked during wait() - then the CV implementation can use this fact to synchronize its own members. I've been able to implements a CV on Windows with only one semaphore - rather than the current impl which uses 2 semaphores and an additional mutex. This means that there are much less sync objects to play with, and a simpler, more efficient code. Surely, as I'm not an expert, I might have some bugs in my impl, but I guess you can see the opportunities in requiring the user some stricter locking, to make the impl simpler.
Providing an interface that will ensure for notifying, locking the *same* mutex as when waiting, ... ... What I'm suggesting is that the condition constructor would accept a mutex reference and store it. The Lock& argument to the wait() method will be removed, as there's already a reference to the mutex. It might also be useful to templatize the condition class for Mutex type:
What would this be of help for? Changing a predicate needs locking the mutex, not signaling it.
True, but requiring the mutex to be locked upon entering notify(), makes it more difficult for the user to forget locking. The wait() function does it (even for a different reason), so notify() might as well. Having the CV contain a reference to the mutex makes it hard to accidentally lock a different mutex for wait() and notify(). And locking while notifying has some objective advantages - it makes our threads respect the priority policy better, as Butenhof explains.
If you mean that the signal function internally first should lock the mutex, signal and then unlock it, this would make things even worse (And is quite different from the case Butenhof addresses in the above citation, since this will be of no help in the priority case.)
Of course I didn't mean that the signal() will lock, signal, and unlock. I meant that the user will lock before calling notify().
Also putting the mutex into the condition ctor would make construction unnecessary complicated. E.g. how would you do in the following (quite common) case:
class foo {
mutex m; condition c; };
Easy: class foo { foo() : m(), c(m) { } mutex m; condition c; };

Yuval Ronen wrote:
Well, I wouldn't dare argue with Butenhof, but:
A. Even he says that with morphing, there is no difference. You mentioned that such morphing would require some OS support, and I agree. No doubt that synchronization primitives require OS support, and such support might not be available. Perhaps this point makes this discussion more appropriate for a POSIX forum, for example. I apologize if it has gone OT. Peter, in his post, claims that even with morphing there's a difference, because there's an optimization to avoid kernel call that can't be used when morphing is used in this case. Whether this optimization is a good enough reason to influence the CV interface is something that can be argued about.
Sorry, but I do not understand your point.
B. Butenhof is talking about a theoretical, ideal OS, which Windows is known to be not... But seriously, if notify() requires the mutex to be locked - the same mutex locked during wait() - then the CV implementation can use this fact to synchronize its own members. I've been able to implements a CV on Windows with only one semaphore - rather than the current impl which uses 2 semaphores and an additional mutex. This means that there are much less sync objects to play with, and a simpler, more efficient code. Surely, as I'm not an expert, I might have some bugs in my impl, but I guess you can see the opportunities in requiring the user some stricter locking, to make the impl simpler.
There is a very long history of attempts to get condition variables correct on windows. Most of them (just the seemingly simple and logical ones) failing badly due to subtle bugs. Since there is no formal prove available I think the most reasonable thing is to use the best known algorithms which had undergone a extensive peer review. If you think you can come up with something better, please move ahead, but be prepared that it will be very hard to a) either give a formal prove of correctness for your algorithm or b) survive review by the more knowledgeable than I am.
True, but requiring the mutex to be locked upon entering notify(), makes it more difficult for the user to forget locking.
This is what I don't quite understand: How does it help when the condition holds a reference to a mutex to make it more difficult for the user to forget locking? You need to have the mutex locked _before_ you call the signal!
Having the CV contain a reference to the mutex makes it hard to accidentally lock a different mutex for wait() and notify().
Ok, lets be more explicit: mutex m; condition c(m); waiting: lock l(m); c.wait(); Ok, no need for l& in call to wait. But is this really better to avoid mistakes? See: mutex m1; condition c1(m1); mutex m2; waiting: lock l(m2); c.wait(); // ouch! this will lock the wrong one, and there is no apparent glue in the source. Now to signaling: mutex m: condition c(m); lock l(m); c.notify_one(); // How is this different from current interface? How do you make use of the referenced mutex?
And locking while notifying has some objective advantages - it makes our threads respect the priority policy better, as Butenhof explains.
This again I cannot understand how it relates to your suggested change in interface. I suggest being more explicit.
Of course I didn't mean that the signal() will lock, signal, and unlock. I meant that the user will lock before calling notify().
Again: How does the referenced mutex help to achieve this goal? About initialisation ...
Easy:
class foo { foo() : m(), c(m) { }
mutex m; condition c; };
Agreed. But you are giving away the possibility of sharing a condition variable among several predicates. Altough this is not a recommended practice since you can easily get it wrong, it is nevertheless a valid usage scenario, which would be ruled out by your suggested change. I do not think that we should forbid the alert user to use the interface in uncommon (but not wrong) ways. Roland

Roland Schwarz wrote:
Yuval Ronen wrote:
Well, I wouldn't dare argue with Butenhof, but:
A. Even he says that with morphing, there is no difference. You mentioned that such morphing would require some OS support, and I agree. No doubt that synchronization primitives require OS support, and such support might not be available. Perhaps this point makes this discussion more appropriate for a POSIX forum, for example. I apologize if it has gone OT. Peter, in his post, claims that even with morphing there's a difference, because there's an optimization to avoid kernel call that can't be used when morphing is used in this case. Whether this optimization is a good enough reason to influence the CV interface is something that can be argued about.
Sorry, but I do not understand your point.
I was referring to your claim that not having a requirement to hold the lock while signaling was aimed to improve speed, and therefor it's impossible that it will cause the opposite. Point A quoted above was to show that it doesn't necessarily improves speed. Pointed B below was specifically about Windows - where I believe that speed can be improved by adding such a requirement.
B. Butenhof is talking about a theoretical, ideal OS, which Windows is known to be not... But seriously, if notify() requires the mutex to be locked - the same mutex locked during wait() - then the CV implementation can use this fact to synchronize its own members. I've been able to implements a CV on Windows with only one semaphore - rather than the current impl which uses 2 semaphores and an additional mutex. This means that there are much less sync objects to play with, and a simpler, more efficient code. Surely, as I'm not an expert, I might have some bugs in my impl, but I guess you can see the opportunities in requiring the user some stricter locking, to make the impl simpler.
There is a very long history of attempts to get condition variables correct on windows. Most of them (just the seemingly simple and logical ones) failing badly due to subtle bugs. Since there is no formal prove available I think the most reasonable thing is to use the best known algorithms which had undergone a extensive peer review. If you think you can come up with something better, please move ahead, but be prepared that it will be very hard to a) either give a formal prove of correctness for your algorithm or b) survive review by the more knowledgeable than I am.
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it. My code is based on an interface they claim is invalid, and they might have no motivation to do a thorough check of it. One point I raised was that in the general case, requiring more from the user, will make the library writer's life easier, which might explain why such a thing is even possible.
True, but requiring the mutex to be locked upon entering notify(), makes it more difficult for the user to forget locking.
This is what I don't quite understand: How does it help when the condition holds a reference to a mutex to make it more difficult for the user to forget locking? You need to have the mutex locked _before_ you call the signal!
Having the CV contain a reference to the mutex makes it hard to accidentally lock a different mutex for wait() and notify().
Ok, lets be more explicit:
mutex m; condition c(m);
waiting:
lock l(m); c.wait();
Ok, no need for l& in call to wait. But is this really better to avoid mistakes? See:
mutex m1; condition c1(m1); mutex m2;
waiting:
lock l(m2); c.wait(); // ouch! this will lock the wrong one, and there is no apparent glue in the source.
Now to signaling:
mutex m: condition c(m);
lock l(m); c.notify_one(); // How is this different from current interface? How do you make use of the referenced mutex?
Ok, I'm afraid I didn't explain myself properly. Apologies for that. What I mentioned only briefly, and had to emphasize more, is that the call to notify() will check to see if the mutex is locked - the mutex that was passed in the ctor, and throw an exception if not. The wait() function does it already, except that it should be changed to check the stored mutex&. I hope this explanation helps answering your questions above.
And locking while notifying has some objective advantages - it makes our threads respect the priority policy better, as Butenhof explains.
This again I cannot understand how it relates to your suggested change in interface. I suggest being more explicit.
If the mutex is always locked while notifying, <copy from your Butenhof quote> the high-priority waiter will be placed before the lower-priority waiter onr the mutex and will be scheduled first. </quote> which is a good thing.
Of course I didn't mean that the signal() will lock, signal, and unlock. I meant that the user will lock before calling notify().
Again: How does the referenced mutex help to achieve this goal?
I hope this was answered.
About initialisation ...
Easy:
class foo { foo() : m(), c(m) { }
mutex m; condition c; };
Agreed.
But you are giving away the possibility of sharing a condition variable among several predicates. Altough this is not a recommended practice since you can easily get it wrong, it is nevertheless a valid usage scenario, which would be ruled out by your suggested change. I do not think that we should forbid the alert user to use the interface in uncommon (but not wrong) ways.
I agree that my proposed interface is more limiting, for example, as you said, sharing a condition variable among several predicates (only if they are protected by different mutexes). Whether or not it forbids *valid* use cases is the real question. I never encountered such a scenario, but hey, I'm just one simple programmer, and I can definitely be wrong. If all you experts think these use cases are valid, then I probably am. Yuval

Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it.
Can't tell. Possibly yes. You will need to try.
Ok, I'm afraid I didn't explain myself properly. Apologies for that. What I mentioned only briefly, and had to emphasize more, is that the call to notify() will check to see if the mutex is locked - the mutex that was passed in the ctor, and throw an exception if not. The wait() function does it already, except that it should be changed to check the stored mutex&.
I hope this explanation helps answering your questions above.
If it is only about protecting the user from as you say "bad usage", where is the problem? If you need this behavior you easily can write a little convenience-wrapper for your projects that achieves this goal. The threading library is designed to allow you the broadest valid usage, only protecting from agreed misuses. Why should we unnecessarily limit the libraries API when you can specialize it to your needs? And even if you might find out that there is a way on windows where your proposed change allows for a faster implementation, you still would need to prove that this also holds for other operating systems. Roland

Roland Schwarz wrote:
Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it.
Can't tell. Possibly yes. You will need to try.
I'll try.
Ok, I'm afraid I didn't explain myself properly. Apologies for that. What I mentioned only briefly, and had to emphasize more, is that the call to notify() will check to see if the mutex is locked - the mutex that was passed in the ctor, and throw an exception if not. The wait() function does it already, except that it should be changed to check the stored mutex&.
I hope this explanation helps answering your questions above.
If it is only about protecting the user from as you say "bad usage", where is the problem? If you need this behavior you easily can write a little convenience-wrapper for your projects that achieves this goal. The threading library is designed to allow you the broadest valid usage, only protecting from agreed misuses. Why should we unnecessarily limit the libraries API when you can specialize it to your needs?
It's true I can write a wrapper to do what I want (I'll need to write a lock class which doesn't do any locking/unlocking, but only return true to an is_locked() question, which is not very pretty, but still feasible). The thing is that I thought I wasn't limiting any valid usages, and all users of the library can benefit from this change. We obviously disagree about that crucial point.
And even if you might find out that there is a way on windows where your proposed change allows for a faster implementation, you still would need to prove that this also holds for other operating systems.
I intended to show that it's faster on Windows, and not any slower on other platforms. That should be enough, performance-wise. Peter showed that I'm avoiding some possible optimizations, which makes it slower on some platforms. As I said, should this optimization issue affect the interface might just be another issue we disagree about. In face of all this disagreement, well, what can a guy do...

Yuval Ronen wrote:
In face of all this disagreement, well, what can a guy do...
What everyone else will do: Either convince the others, i.e. prove your points or respect that you have learned something. At least this is what I am trying to do. This forum is highly technically focused and will respect your arguments. I am in no different position than you are. The fact that I am currently co-maintainer of the Boost.Thread does not give me the right to introduce arbitrary changes which I think are good. I also have to ask for consensus first. Given the fact that there currently is one pro and one contra vote, is not enough to change anything. Agreed? Perhaps you should restart from fresh. Perhaps some "Meta"-hints: From rereading your original post I got the following impression: You found a way to speed up condition variables on Windows. Unfortunately you found out, that this is not possible with the current interface. Instead of pointing out this fact, you have been proposing a interface change, and stating that the speed up is a mere "side effect". So this gave me the impression you did not try to carefully research the effects of the proposed interface change. How much present code will break? What valid usage scenarios will get invalid? Of course I might be totally wrong because I am kind of interpreting you. This solely is meant to give you feedback how your message arrived at me. (Can't speak for others.) Also this kind of arguing is by no means a technical one. But perhaps it re-encourages you to try again. I personally would like to see a reworked statement, that also considers the arguments you have heard so far, ideally accompanied by whatever is necessary to prove your points. Roland

Roland Schwarz wrote:
Yuval Ronen wrote:
In face of all this disagreement, well, what can a guy do...
What everyone else will do:
Either convince the others, i.e. prove your points
or
respect that you have learned something.
Of course I have learned something. Learned a lot. I always do from the discussions here.
Given the fact that there currently is one pro and one contra vote, is not enough to change anything. Agreed?
Of course. Especially given the fact there is more than one contra...
From rereading your original post I got the following impression: You found a way to speed up condition variables on Windows. Unfortunately you found out, that this is not possible with the current interface. Instead of pointing out this fact, you have been proposing a interface change, and stating that the speed up is a mere "side effect".
Now that's not true. The Windows speed up is really just a side effect of a suggestion focused on interface. I didn't just "make it look like it".
I personally would like to see a reworked statement, that also considers the arguments you have heard so far, ideally accompanied by whatever is necessary to prove your points.
I have a feeling you misinterpreted my "what can a guy do" comment. It was made with humor, not frustration or anger. I'm completely ok with not being able to convince. Completely.

Yuval Ronen wrote:
I have a feeling you misinterpreted my "what can a guy do" comment.
This certainly is due to the fact that I am not native English. So I too have learned something :-)
It was made with humor, not frustration or anger. I'm completely ok with not being able to convince. Completely.
Fine. Roland

Roland Schwarz wrote:
Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it.
Can't tell. Possibly yes. You will need to try.
Ok, if anyone is still interested, here's my implementation of CV for Windows. It's very simple, maybe too simple, and I guess it's far less interesting than the code just posted by Chris Thomasson, but here it is anyway. #ifndef WinCountingSem_h #define WinCountingSem_h #include <Windows.h> #include <cassert> namespace PlatformSpecific { class WinCountingSem { public: typedef HANDLE SemphoreImpl; static const size_t maxValue = 0xFFFFFFL; public: WinCountingSem(size_t a_initialValue, size_t a_maxValue = maxValue) { m_impl = ::CreateSemaphore(NULL, a_initialValue, a_maxValue, NULL); assert(m_impl != NULL); } ~WinCountingSem() { BOOL res = ::CloseHandle(m_impl); assert(res == true); } void lock() { DWORD res = ::WaitForSingleObject(m_impl, INFINITE); assert(res == WAIT_OBJECT_0); } void unlock(size_t a_count = 1) { BOOL res = ::ReleaseSemaphore(m_impl, a_count, NULL); assert(res == true); } private: SemphoreImpl m_impl; WinCountingSem(const WinCountingSem &); WinCountingSem & operator=(const WinCountingSem &); }; } // namespace PlatformSpecific #endif /* ! defined WinCountingSem_h */ #ifndef WinConditionVar_h #define WinConditionVar_h #include <cassert> #include "Mutex.h" #include "WinCountingSem.h" namespace PlatformSpecific { class WinConditionVar { public: WinConditionVar(Mutex &a_mutex) : m_waitersCount(0), m_sem(0), m_mutex(a_mutex) { } void wait() { assert(m_mutex.isLocked()); m_waitersCount++; m_mutex.unlock(); m_sem.lock(); m_mutex.lock(); } void signal() { assert(m_mutex.isLocked()); if (m_waitersCount > 0) { m_sem.unlock(); m_waitersCount--; } } void broadcast() { assert(m_mutex.isLocked()); if (m_waitersCount > 0) { m_sem.unlock(m_waitersCount); m_waitersCount = 0; } } private: size_t m_waitersCount; WinCountingSem m_sem; Mutex &m_mutex; WinConditionVar(const WinConditionVar &); WinConditionVar & operator=(const WinConditionVar &); }; } // namespace PlatformSpecific #endif /* ! defined WinConditionVar_h */

Yuval Ronen <ronen_yuval@yahoo.com> writes:
Roland Schwarz wrote:
Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it. Can't tell. Possibly yes. You will need to try.
Ok, if anyone is still interested, here's my implementation of CV for Windows. It's very simple, maybe too simple, and I guess it's far less interesting than the code just posted by Chris Thomasson, but here it is anyway.
void wait() { assert(m_mutex.isLocked()); m_waitersCount++; // line A m_mutex.unlock(); m_sem.lock(); // line B m_mutex.lock(); // line C }
void broadcast() { assert(m_mutex.isLocked()); if (m_waitersCount > 0) { m_sem.unlock(m_waitersCount); // line D m_waitersCount = 0; } }
Unfortunately, it is susceptible to the "stolen wakeup" problem. If m_waitersCount is non-zero, the semaphore is incremented on line D. This will wake the appropriate number of waiting threads, but not immediately. Threads waiting on a semaphore will not be woken until they are next scheduled. Thus, a new thread might call wait, increment the waitersCount (line A) and acquire the (line B) before all the threads currently waiting have been woken. There are two solutions to this that I can think of. The first is to have some kind of lock which prevents additional threads from waiting until all the notified threads have been woken. The second is what Chris Thomasson talks about --- swapping out the "wait set" for a new one on notify, so that no new threads can be added to that waitset, even if the notified threads haven't all woken up. This is the technique used in the win32 condvar implementation currently on the thread_rewrite branch. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Yuval Ronen <ronen_yuval@yahoo.com> writes:
Roland Schwarz wrote:
Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it. Can't tell. Possibly yes. You will need to try. Ok, if anyone is still interested, here's my implementation of CV for Windows. It's very simple, maybe too simple, and I guess it's far less interesting than the code just posted by Chris Thomasson, but here it is anyway.
void wait() { assert(m_mutex.isLocked()); m_waitersCount++; // line A m_mutex.unlock(); m_sem.lock(); // line B m_mutex.lock(); // line C }
void broadcast() { assert(m_mutex.isLocked()); if (m_waitersCount > 0) { m_sem.unlock(m_waitersCount); // line D m_waitersCount = 0; } }
Unfortunately, it is susceptible to the "stolen wakeup" problem. If m_waitersCount is non-zero, the semaphore is incremented on line D. This will wake the appropriate number of waiting threads, but not immediately. Threads waiting on a semaphore will not be woken until they are next scheduled. Thus, a new thread might call wait, increment the waitersCount (line A) and acquire the (line B) before all the threads currently waiting have been woken.
I have to admit that I didn't understand the problem. Your description sounds exactly like what should've happen. At the end, all threads that were waiting before the call to broadcast() are woken, and the new thread is waiting. Just as it should. What have I missed?

Yuval Ronen <ronen_yuval@yahoo.com> writes:
Anthony Williams wrote:
Yuval Ronen <ronen_yuval@yahoo.com> writes:
Roland Schwarz wrote:
Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it. Can't tell. Possibly yes. You will need to try. Ok, if anyone is still interested, here's my implementation of CV for Windows. It's very simple, maybe too simple, and I guess it's far less interesting than the code just posted by Chris Thomasson, but here it is anyway.
void wait() { assert(m_mutex.isLocked()); m_waitersCount++; // line A m_mutex.unlock(); m_sem.lock(); // line B m_mutex.lock(); // line C }
void broadcast() { assert(m_mutex.isLocked()); if (m_waitersCount > 0) { m_sem.unlock(m_waitersCount); // line D m_waitersCount = 0; } }
Unfortunately, it is susceptible to the "stolen wakeup" problem. If m_waitersCount is non-zero, the semaphore is incremented on line D. This will wake the appropriate number of waiting threads, but not immediately. Threads waiting on a semaphore will not be woken until they are next scheduled. Thus, a new thread might call wait, increment the waitersCount (line A) and acquire the (line B) before all the threads currently waiting have been woken.
I have to admit that I didn't understand the problem. Your description sounds exactly like what should've happen. At the end, all threads that were waiting before the call to broadcast() are woken, and the new thread is waiting. Just as it should. What have I missed?
Thread A: locks mutex Enters wait() Runs line A, waitersCount=1 Unlocks mutex Runs line B, blocks. Thread B: locks mutex Enters wait() Runs line A, waitersCount=2 unlocks mutex Runs line B, blocks. Thread C: locks mutex Enters broadcast() Runs line D: Unlock semaphore(2) // 2 threads waiting waitersCount=0 unlocks mutex Thread A: wakes on line B, sem count is now 1 locks mutex on line C unlocks mutex Thread D: locks mutex Enters wait() Runs line A, waitersCount=1 unlocks mutex Runs line B, sem count is 1, so acquires it and sets to 0 locks mutex on line C Thread B: remains blocked on line B Thread D has stolen the wakeup intended for thread B, even though it entered "wait" after the broadcast. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Yuval Ronen <ronen_yuval@yahoo.com> writes:
Yuval Ronen <ronen_yuval@yahoo.com> writes:
Roland Schwarz wrote:
Yuval Ronen wrote:
I will gladly put my code here to be scrutinized by the experts on this forum (after the weekend, as the code is at work, and I'm at home now :-) ), but I fear that they will little incentive to do it. Can't tell. Possibly yes. You will need to try. Ok, if anyone is still interested, here's my implementation of CV for Windows. It's very simple, maybe too simple, and I guess it's far less interesting than the code just posted by Chris Thomasson, but here it is anyway.
void wait() { assert(m_mutex.isLocked()); m_waitersCount++; // line A m_mutex.unlock(); m_sem.lock(); // line B m_mutex.lock(); // line C }
void broadcast() { assert(m_mutex.isLocked()); if (m_waitersCount > 0) { m_sem.unlock(m_waitersCount); // line D m_waitersCount = 0; } } Unfortunately, it is susceptible to the "stolen wakeup" problem. If m_waitersCount is non-zero, the semaphore is incremented on line D. This will wake the appropriate number of waiting threads, but not immediately. Threads waiting on a semaphore will not be woken until they are next scheduled. Thus, a new thread might call wait, increment the waitersCount (line A) and acquire the (line B) before all the threads currently waiting have been woken. I have to admit that I didn't understand the problem. Your description sounds exactly like what should've happen. At the end, all threads that were waiting before the call to broadcast() are woken, and the new
Anthony Williams wrote: thread is waiting. Just as it should. What have I missed?
Thread A:
locks mutex Enters wait() Runs line A, waitersCount=1 Unlocks mutex Runs line B, blocks.
Thread B:
locks mutex Enters wait() Runs line A, waitersCount=2 unlocks mutex Runs line B, blocks.
Thread C:
locks mutex Enters broadcast() Runs line D: Unlock semaphore(2) // 2 threads waiting waitersCount=0 unlocks mutex
Thread A:
wakes on line B, sem count is now 1 locks mutex on line C unlocks mutex
Thread D:
locks mutex Enters wait() Runs line A, waitersCount=1 unlocks mutex Runs line B, sem count is 1, so acquires it and sets to 0 locks mutex on line C
Thread B:
remains blocked on line B
Thread D has stolen the wakeup intended for thread B, even though it entered "wait" after the broadcast.
Ah, I see. What you're saying, IIUC, is that calling Win32's ReleaseSemaphore to release a waiting thread, and then calling WaitForSingleObject, might block either of the threads, and we don't know which. I've looked at MSDN to see if this is true, but couldn't find any explicit reference to this situation. I guess that if you say it can happen, then you know what you're talking about. The question is whether this is a bad thing. If the threads both wait on the same predicate (which from now on is a pre-condition of this CV implementation, and is not a huge disadvantage, IMO), then maybe it doesn't really matter.

Anthony Williams wrote:
Unfortunately, it is susceptible to the "stolen wakeup" problem. If m_waitersCount is non-zero, the semaphore is incremented on line D. This will wake the appropriate number of waiting threads, but not immediately. Threads waiting on a semaphore will not be woken until they are next scheduled. Thus, a new thread might call wait, increment the waitersCount (line A) and acquire the (line B) before all the threads currently waiting have been woken.
You seem to be implying that ReleaseSemaphore with a count > 1 is not an atomic operation but behaves as a series of ReleaseSemaphore with count == 1 calls? Is that correct? Or did I misunderstand something? I'd expect it to atomically wake count threads and decrement the semaphore counter with count.

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
Unfortunately, it is susceptible to the "stolen wakeup" problem. If m_waitersCount is non-zero, the semaphore is incremented on line D. This will wake the appropriate number of waiting threads, but not immediately. Threads waiting on a semaphore will not be woken until they are next scheduled. Thus, a new thread might call wait, increment the waitersCount (line A) and acquire the (line B) before all the threads currently waiting have been woken.
You seem to be implying that ReleaseSemaphore with a count > 1 is not an atomic operation but behaves as a series of ReleaseSemaphore with count == 1 calls? Is that correct? Or did I misunderstand something? I'd expect it to atomically wake count threads and decrement the semaphore counter with count.
If there are N threads waiting on the semaphore, and you do ReleaseSemaphore with a count >= N, I don't believe that the win32 api guarantees that the waiting threads will necessarily be moved to "ready" state atomically as part of the ReleaseSemaphore call. In particular, even though the user-level code is blocked within WaitForSingleObject (or similar) in the waiting threads, there might be a kernel-level APC which takes that thread out of the wait state, does something, and then returns it to the wait state. If the call to ReleaseSemaphore occurs concurrently with the processing of the APC, then the thread will not see the notify until it returns, and the notify may be taken by another thread that calls WaitForSingleObject on the Semaphore whilst the first thread is processing the kernel APC. I believe the following is a valid scenario: Thread A: WaitForSingleObject(hSemaphore) // blocks Thread B: ReleaseSemaphore(hSemaphore,1) Thread C: WaitForSingleObject(hSemaphore) // returns immediately Thread A still blocked I can't remember where I got this impression, and I may be wrong. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
I believe the following is a valid scenario:
Thread A:
WaitForSingleObject(hSemaphore) // blocks
Thread B:
ReleaseSemaphore(hSemaphore,1)
Thread C:
WaitForSingleObject(hSemaphore) // returns immediately
Thread A still blocked
I can't remember where I got this impression, and I may be wrong.
No, you're absolutely right. This is something that, for example, The Old New Thing had an article about. Sebastian Redl

Sebastian Redl wrote:
Anthony Williams wrote:
I believe the following is a valid scenario:
Thread A:
WaitForSingleObject(hSemaphore) // blocks
Thread B:
ReleaseSemaphore(hSemaphore,1)
Thread C:
WaitForSingleObject(hSemaphore) // returns immediately
Thread A still blocked
I can't remember where I got this impression, and I may be wrong.
No, you're absolutely right. This is something that, for example, The Old New Thing had an article about.
Yep, here it is: http://blogs.msdn.com/oldnewthing/archive/2006/03/13/550402.aspx

Ultimate condvar solution: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/de... I created one of the fastest condvar implementation for windows out there. Beats Vistas stuff, and devastates pthread-win32. Everything is fast-pathed to the max. It has atomic operations free fast-pathing on all fronts.
participants (7)
-
Anthony Williams
-
Chris Thomasson
-
Peter Dimov
-
Prashant Thakre
-
Roland Schwarz
-
Sebastian Redl
-
Yuval Ronen