wait/notify problem

Hi, I am trying to implement MANAGER:WORKER model using boost thread_group. I am using wait() and notify() for that. In my application, when thread1 wants to listen for some event from thread2, it executes wait(). when thread2 wants to inform about the event to thread1, it executes notify(). The problem I am facing is : thread1 first executes wait(). then thread2 executes notify(). but notify() completes its execution even before thread1 actually starts waiting for the event and notify() is goes in vain and then thread1 keeps on waiting infinitely. To avoid this, right now I am using sleep(1) before thread2 calls notify() to make sure that thread1 really reaches the actual wait state. But this is not a permanent solution; as it is possible that even within 1sec thread1 might not actually start listening for the event. (Also 1sec makes my applcation very slow). So, I just want to know if there is any such mechanism which will enable thread2 to keep notify()ing until the event is accepted by thread1? i.e. blocked notify(). Regards, Girish

On Fri, Jan 15, 2010 at 4:09 AM, girish hilage
Hi,
when thread1 wants to listen for some event from thread2, it executes wait(). when thread2 wants to inform about the event to thread1, it executes notify().
The problem I am facing is : thread1 first executes wait(). then thread2 executes notify().
but notify() completes its execution even before thread1 actually starts waiting for the event and notify() is goes in vain and then thread1 keeps on waiting infinitely.
Regards, Girish
You need state external to the event. A variable like static int things_to_do = 0; // shared void thread1() { scoped_lock lock(mutex); while (!exit) { while (!things_to_do) cond.wait(mutex); do_things(); } } void thread2() { { scoped_lock lock(mutex); add_thing_to_do(); things_to_do++; } cond.notify(); } Tony

Hi Gottlob,
Thanks for your reply.
I had written below code for this.
void enable_flag (boost::mutex &mut, bool &flag)
{
mut.lock ();
flag = 1;
mut.unlock ();
}
void mywait (boost::condition_variable &cond, boost::mutex &mut, bool &flag)
{
boost::mutex tmp_mutex;
boost::unique_lockboost::mutex lock(tmp_mutex);
enable_flag (mut, flag);
cond.wait (lock);
}
void notify_until_success (boost::condition_variable &cond, boost::mutex &mut, bool &flag)
{
for (;;)
{
usleep (500);
mut.lock ();
if (flag == 1)
{
cond.notify_one();
flag = 0;
mut.unlock ();
break;
}
mut.unlock ();
}
}
But, the mistake I had done was, creation of 'tmp_mutex' inside mywait() and also I had not carefully read the documentation on wait() which says "Atomically call lock.unlock() and blocks the current thread."
I read this while browsing for scoped_lock().
The code I have now written is as follows (which seems to work fine) :
void enable_flag (boost::mutex &mut, bool &flag)
{
mut.lock ();
flag = 1;
mut.unlock ();
}
void mywait (boost::condition_variable &cond, boost::mutex &mut, boost::mutex &flag_mut, bool &flag)
{
boost::unique_lockboost::mutex lock(mut);
enable_flag (flag_mut, flag);
cond.wait (lock);
}
void notify_until_success (boost::condition_variable &cond, boost::mutex &mut, boost::mutex &flag_mut, bool &flag)
{
for (;;)
{
mut.lock ();
flag_mut.lock ();
if (flag == 1)
{
cond.notify_one();
flag = 0;
flag_mut.unlock ();
mut.unlock ();
break;
}
flag_mut.unlock ();
mut.unlock ();
}
}
void worker_function (void)
{
for (;;)
{
/* wait for the new job */
mywait (new_job_condition, new_job_mutex, new_job_notification_reached_flag_mutex, new_job_notification_reached_flag);
..... read new job .....
..... do the work .....
/* notify job done */
notify_until_success (job_done_condition, job_done_mutex, job_done_notification_reached_flag_mutex, job_done_notification_reached_flag);
}
}
int main (void)
{
..... create worker threads .....
while (/*there is work to be done */)
{
/* notify any one of the free threads about new_job */
notify_until_success (new_job_condition, new_job_mutex, new_job_notification_reached_flag_mutex, new_job_notification_reached_flag);
/* wait for job_done notification from any of the busy thread */
mywait (job_done_condition, job_done_mutex, job_done_notification_reached_flag_mutex, job_done_notification_reached_flag);
}
}
I hope this will not create any problems.
(I have also edited the problem statement in the mail below to make it more clear. Shown by __TEXT_EDITED__.)
Regards,
Girish
--- On Fri, 1/15/10, Gottlob Frege

On Mon, Jan 18, 2010 at 8:00 AM, girish hilage
The code I have now written is as follows (which seems to work fine) :
This code is broken. I find it hard to read and understand, and it contains 4 different mutexes, which is asking for trouble. But besides those 'aesthetic' complaints, it really is broken. (IIUC) It appears that you are trying very hard to make sure someone is waiting on the condition before notifying. I was tempted to say this can't be done, but you have done it nonetheless. Although I am not sure it helps. Please research 'spurious wakeups' of condition variables:
void mywait (boost::condition_variable &cond, boost::mutex &mut, boost::mutex &flag_mut, bool &flag) { boost::unique_lockboost::mutex lock(mut);
enable_flag (flag_mut, flag);
cond.wait (lock);
This cond.wait(lock) may wake up even though cond.notify_one() was NOT called. This may seem strange, but it is allowed because otherwise implementing conditions becomes very hard. And by allowing it, it encourages developers to code with conditions properly. To use a condition you MUST do so in a loop which checks the global *state* that the condition is communicating. ie in your case 'new job available'. The standard could have stored state inside the condition variable, but it was harder, and it was also assumed that you can determine the state better on your own, so it is outside the condvar. It also makes it easier for multiple threads to agree - they can all look at the external state (available jobs) and decide if something needs to be done, or if another thread has already done it. Another way to look at things: spurious wakeups mean that, in effect, notify_one might wake up more than one thread (they still battle for the mutex of course), but notify_one is essentially (worse case) the same as notify_all - ie all the other threads could 'spuriously' wake up. ie notify_one could be implemented by just calling notify_all (it would be a poor implementation, but valid). So notify_one is just an optimization (*probably* only one thread will awaken), not a guarantee. So what happens in your code if cond.wait() wakens spuriously? It appears a job is 'done' (or attempted) even though no job exists. If you can tell (from the worker thread) that there is no job to do, then make THAT your external state, and wrap the cond.wait() inside a loop: }
void worker_function (void) {
for (;;) {
while ( ! /* there is work to be done */) // note the '!' at the
front cond.wait(mutex); ..... read new job .....
// probably need to hold lock while reading job (from some shared queue or whatever) // then unlock so you can do the work *in parallel* with other threads mutex.unlock(); // unlock before doing work
..... do the work .....
mutex.lock(); // lock again before checking /* there is work to be done */, unlocked inside cond.wait()
} }
int main (void) { ..... create worker threads .....
while (/*there is work to be done */) {
cond.notify_all(); // there may be more than one job to do, so notify all
/* wait for job_done notification from any of the busy thread */ mywait (job_done_condition, job_done_mutex, job_done_notification_reached_flag_mutex, job_done_notification_reached_flag);
Why do you wait for the job to be done? If you send a job to another thread, then do nothing but wait for it to be done, why not just do it yourself? ie it appears that your code will run like this: Thread 1 Thread 2 wait: check for job ..... notify wake wait for done: do job ..... keep doing that job ..... keep doing that job ..... job done! wake! wait for new job (ie repeat) do the 2 threads ever do any real work at the same time, or is one always waiting for the other?
} }
I hope this will not create any problems. (I have also edited the problem statement in the mail below to make it more clear. Shown by __TEXT_EDITED__.)
Regards, Girish
I honestly had (still have?) trouble understanding the code, but I don't think it works, and I think it could be much simpler. Basically, just use the condvars to tell threads 'hey *CHECK* if there is something to do'. Do NOT try to use condvars to actually communicate state. Just 'hey check the state'. Then let the threads decide what to do based on that state. condvars also nicely handle the lock/unlock/relock that you probably needed to do around the check and wait anyhow. So the primary purpose of the lock is to protect your state, not for the condvar. The condvar is just nice enough to know how to work with that. In your case your state is the state of the job list. Put a mutex around that, and have the condvar use that mutex as well. Check the state of the job list to decide what (if anything) to do. There is only one state (job list) and one mutex and one condvar. Tony Tony

Hi Gottlob,
Thanks for your detailed reply.
I will go through it and see if I can implement what you have suggested.
In the mean time, please find below a complete program.
I did not send it in the previous mail just to save on the space.
Let me know if you have any more comments.
-----------------------------------------------------------------------
# include <iostream>
# include <vector>
# include <string>
# include

Hi Gottlob,
Please find below code I have now written after your suggestions.
Let me know if this looks ok.
I also got help from http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2005-02/0208.html
# include <iostream>
# include <vector>
# include <string>
# include
participants (2)
-
girish hilage
-
Gottlob Frege