Hi there,
at the end of this mail is a small program that starts three threads out of
a fourth one ("thread4"). thread1, 2 and 3 are not allowed to start
execution before a certain condition is met (here: go_ is set to true and
the threads are notified of the changed condition).
The threads increment a variable. Once this variable has reached a threshold
(here: 10), execution is supposed to stop. This is done by calling the
interrupt() function on each thread, which in turn catch the corresponding
exception and act on it.
All this seems to work nicely, except for the fact that the "master thread"
(thread4 in this example) does not seem to get any processing time over a
long period. Hence thread 1,2 and 3 count to a very high number before the
stop condition is met.
Here is the output of the program:
Starting threads
Going to sleep for 2 seconds in startAndStopThreads()
Hello world Nr. 0 from thread 1
Hello world Nr. 1 from thread 2
[...]
Hello world Nr. 6167 from thread 2
Hello world Nr. 6168 from thread 2
Sending interrupt
Received interrupt in thread 3
Received interrupt in thread 1
Received interrupt in thread 2
Done ...
Occasionally the counting can go on up to a few hundred thousand. This is on
a single-processor machine with OpenSUSE 11/64 bit, g++ 4.3.1 and a recent
trunk version of Boost 1.36 .
The situation is much better on a four-processor system with the same
operating system - the counter typically reaches only a few hundred there,
presumably as thread4 gets access to a time slice more quickly (but why
isn't it running permanently on a four processor machine - there are only 4
active threads ?).
Is there anything that can be done to improve the time slices allocated to
thread4 ? I am already calling the yield() function in thread1, 2 and 3.
Thanks and Best Regards,
Ruediger
/*********************************************************************/
#include <iostream>
#include
On Mon, Aug 11, 2008 at 02:01:46PM +0200, Ruediger Berlich wrote:
The threads increment a variable. Once this variable has reached a threshold (here: 10), execution is supposed to stop. This is done by calling the interrupt() function on each thread, which in turn catch the corresponding exception and act on it.
All this seems to work nicely, except for the fact that the "master thread" (thread4 in this example) does not seem to get any processing time over a long period. Hence thread 1,2 and 3 count to a very high number before the stop condition is met.
Threads are by their very nature non-deterministic. If you want deterministic behavior, i.e., behavior that is independent of scheduling order and relative speeds of threads, you have to code it yourself by using appropriate mechanisms.
Is there anything that can be done to improve the time slices allocated to thread4 ? I am already calling the yield() function in thread1, 2 and 3.
Friendly advice: get a good book on multi-threaded programming before proceeding with your project. Answer to your question: look up the sched_setscheduler function. Assign SCHED_RR policy and equal priorities to all threads. You need to be root to do be able to do that. (Or use some fine-grained capability mechanism.. Solaris has one, dunno about linux... found it; give your user CAP_SYS_NICE privilege.)
boost::unique_lockboost::mutex lock(helloMutex_); go_=true; lock.unlock(); readyToGo_.notify_all();
The condition variable should be signaled _before_ the lock is unlocked, otherwise you have a race condition upon which causes the signal to get lost.
Friendly advice: get a good book on multi-threaded programming before proceeding with your project.
Any suggestions?
The condition variable should be signaled _before_ the lock is unlocked, otherwise you have a race condition upon which causes the signal to get lost.
Is this definitely the case and please can you describe the error in more detail (or provide a link... or tell me to read the book...) ****************************************************************************** "This message and any attachments are solely for the intended recipient and may contain confidential and privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you." Interactive Transaction Solutions Ltd (2473364 England) Registered Office: Systems House, Station Approach Emsworth PO10 7PW ********************************************************************** Ce message �lectronique contient des informations confidentielles � l'usage unique des destinataires indiqu�s, personnes physiques ou morales. Si vous n'�tes pas le destinataire voulu, toute divulgation, copie, ou diffusion ou toute autre utilisation de ces informations, est interdite. Si vous avez re�u ce message �lectronique par erreur, nous vous remercions d'en avertir son exp�diteur imm�diatement par email et de d�truire ce message ainsi que les �l�ments attach�s. Interactive transaction Solutions SAS- France (RCS Pontoise : 489 397 877) Si�ge social : Parc Saint Christophe, 10, Avenue de l�Entreprise 95865 Cergy-Pontoise Cedex ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
Friendly advice: get a good book on multi-threaded programming before
proceeding with your project.
Any suggestions?
The condition variable should be signaled _before_ the lock is unlocked, otherwise you have a race condition upon which causes the signal to get lost.
Is this definitely the case and please can you describe the error in more detail (or provide a link... or tell me to read the book...)
Got it, the condition can be lost in the code to handle spurious wake ups... still like any suggestions about books though. ****************************************************************************** "This message and any attachments are solely for the intended recipient and may contain confidential and privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you." Interactive Transaction Solutions Ltd (2473364 England) Registered Office: Systems House, Station Approach Emsworth PO10 7PW ********************************************************************** Ce message �lectronique contient des informations confidentielles � l'usage unique des destinataires indiqu�s, personnes physiques ou morales. Si vous n'�tes pas le destinataire voulu, toute divulgation, copie, ou diffusion ou toute autre utilisation de ces informations, est interdite. Si vous avez re�u ce message �lectronique par erreur, nous vous remercions d'en avertir son exp�diteur imm�diatement par email et de d�truire ce message ainsi que les �l�ments attach�s. Interactive transaction Solutions SAS- France (RCS Pontoise : 489 397 877) Si�ge social : Parc Saint Christophe, 10, Avenue de l�Entreprise 95865 Cergy-Pontoise Cedex ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
Dear Zeljko, first of all, thanks for your answer! Regarding your suggestion that "The condition variable should be signaled before the lock is unlocked", I have followed the Boost.thread documentation, as well as the "Thread-Safe Queue using Condition Variables" tutorial in my implementation. See e.g. here: http://www.boost.org/doc/libs/1_35_0/doc/html/thread/synchronization.html#th... and here: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-... In the second code snipped of the synopsis part of the condition_variable documentation, the notification is clearly done after the lock has been destroyed. The lock only exists within the scope. And Anthony writes "By rewriting the function so that the notification comes after the mutex is unlocked, the waiting thread will be able to acquire the mutex without blocking" See also the code in the Boost.circular_buffer examples: <Boost-Root>/libs/circular_buffer/test/bounded_buffer_comparison.cpp : In the "push_front" function: [...] lock.unlock(); m_not_empty.notify_one(); [...] So all these examples unlock the mutex before doing the notification. And I still don't understand why this should not be correct: In order to reach the part where the notification happens, the function must have locked the mutex first (and unlocked it right before the notification). This means that the other threads are either waiting to be notified through the condition variable, or haven't reached that part yet. If they are waiting, I can signal them. If they aren't and have been lucky enough to acquire the lock after my explicit call to unlock(), they need to check the condition itself ("bool go_" in my case). As it has already been set to "true" within the protected/synchronized area, everything is fine and they need no notification. And those threads that have been woken up by the notification need to acquire the lock first. "bool go_" itself is protected through the mutex. Will think about it. Best, Ruediger
On Mon, Aug 11, 2008 at 06:27:05PM +0200, Ruediger Berlich wrote:
first of all, thanks for your answer!
Thanks to Patrick and you for questioning my judgement :-) First, the shorter part, about the book: "PThreads Primer: A Guide To Multithreaded Programming". If you search google for "pthreads primer" (without the quotes), you might get lucky :-) I haven't read the whole of it, but the table of contents indicates that it's quite comprehensive; among other stuff it has a section on scheduling from which I quote: "There are times when you will want to deal with scheduling directly, but those times are few and far between for any of the libraries. If you find yourself thinking about this a lot, you're probably doing something wrong." Other suggested reading is an article by Per Brinch Hansen: "Concurrent Programming Concepts"
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-...
And Anthony writes "By rewriting the function so that the notification comes after the mutex is unlocked, the waiting thread will be able to acquire the mutex without blocking"
I feel uneasy about this, because if you have three threads operating on the same queue, the following might happen: A: wait_and_pop; sleeps and unlocks the mutex B: push; unlocks the mutex, gets preempted by C C: wait_and_pop: mutex is unlocked; manages to pop the item, making the stack empty B: gets scheduled; signals the condvar; A gets a spurious wakeup Turns out that this "optimization" is not an optimization after all -- short mutex contention need not lead to sleep, while spurious wakeup most certainly will induce an unnecessary context switch. Anyway, this is premature optimization. Even worse: A: push; unlock; gets preempted B: wait_and_pop: succeeds, making the queue empty; unlocks the mutex A: gets scheduled again, notifies the condvar which has no waiters C: wait_and_pop: acquires mutex, sleeps on the condvar Tadam! A signal got lost. In this example it might not seem as a big deal, but I could probably think of a slightly more complex example where it might matter.. And signalling on was_empty is actually bad design, IMHO. That boolean should be substituted by an integer "nwaiters" which would be incremented before wait(), and decremented just before wait_and_pop returns. Similarly, the signal should be sent when nwaiters > 0 (Having the above boolean is hardly better than unconditional signaling; even though the stack was empty, there's no guarantee that there are any waiters there.)
In order to reach the part where the notification happens, the function must have locked the mutex first (and unlocked it right before the
Actually, I think that your reasoning is correct in this particular case.
Zeljko Vrba:
Even worse:
A: push; unlock; gets preempted B: wait_and_pop: succeeds, making the queue empty; unlocks the mutex A: gets scheduled again, notifies the condvar which has no waiters C: wait_and_pop: acquires mutex, sleeps on the condvar
No. C doesn't sleep because the wait is guarded by the predicate the_queue.empty().
On Mon, Aug 11, 2008 at 10:46:43PM +0300, Peter Dimov wrote:
Zeljko Vrba:
Even worse:
A: push; unlock; gets preempted B: wait_and_pop: succeeds, making the queue empty; unlocks the mutex A: gets scheduled again, notifies the condvar which has no waiters C: wait_and_pop: acquires mutex, sleeps on the condvar
No. C doesn't sleep because the wait is guarded by the predicate the_queue.empty().
But the predicate is true: B suceeded in popping the item from the stack, and made the stack empty [we assume that the stack was empty to begin with]. the_queue.empty() will return true after C has acquired mutex.
I wrote:
Zeljko Vrba:
Even worse:
A: push; unlock; gets preempted B: wait_and_pop: succeeds, making the queue empty; unlocks the mutex A: gets scheduled again, notifies the condvar which has no waiters C: wait_and_pop: acquires mutex, sleeps on the condvar
No. C doesn't sleep because the wait is guarded by the predicate the_queue.empty().
Oops. The queue is empty. So C does sleep, awaiting a push. Why is this a problem?
On Mon, Aug 11, 2008 at 10:57:23PM +0300, Peter Dimov wrote:
Oops. The queue is empty. So C does sleep, awaiting a push. Why is this a problem?
Not a problem in this particular case. A more elaborate example of what can easily go wrong with signaling outside of critical section: http://groups.google.com/group/comp.programming.threads/browse_thread/thread...
Zeljko Vrba wrote:
On Mon, Aug 11, 2008 at 10:57:23PM +0300, Peter Dimov wrote:
Oops. The queue is empty. So C does sleep, awaiting a push. Why is this a problem?
Not a problem in this particular case.
Signaling outside the mutex lock is a textbook CV use which POSIX condition variables explicitly support (their implementation can be much simpler if they do not have to). You're welcome to not use it if you like, but there's nothing inherently wrong with it and it doesn't have anything to do with the original question.
A more elaborate example of what can easily go wrong with signaling outside of critical section: http://groups.google.com/group/comp.programming.threads/browse_thread/thread...
We had a similar thread a while ago. It's true that moving the signal inside the lock can solve such a CV destruction race. It might be interesting to discuss an example (if you have one) which is not easily rewritten to not have this problem.
participants (4)
-
Patrick Loney
-
Peter Dimov
-
Ruediger Berlich
-
Zeljko Vrba