thread_group::interrupt_all is not reliable

I've discovered that under circumstances apparently related to timing and load, sending interrupt_all to a thread_group when all the threads are waiting on a boost::condition_variable leaves one thread waiting about 1/3 of the time. This is with boost 1_40_0 running on Mac OS X 10.6.2, with 32-bit boost libraries. Boost uses the posix thread system here. I boiled my app down to some test code that runs as a command-line app. It's a bit longer than I'd like, but this configuration seems to be necessary to invoke the problem. The test uses a queue to pass "tasks" from the main thread to worker threads, and another queue to pass "results" back to the main thread. The problem is most apparent when all the tasks are finished and the queue empties, so that all the worker threads are waiting on the input queue when the main thread sends interrupt_all. I've looked at the waiting thread in a debugger when this happens, and found that it has been interrupted, but is still waiting on the condition. It looks like it just got missed by the interrupt_all. This is more likely to happen when there are a lot of worker threads (16, or one per core in my testing). The test code is parked at http://sb.org/ThreadTest.zip, 20KB. It's an XCode 3.2 project, but the five source files could be readily compiled and run in any Unix environment. I don't see any errors in the code that could cause these failures. There is a work-around, which is to interrupt the waiting thread again. This required a modified version of thread_group so I could do a timed_join_all on it. I welcome any suggestions about what could be wrong here, or ways to simplify the test to make it more suitable for a bug report. - Stoney -- Stonewall Ballard stoney@sb.org http://stoney.sb.org/

Stonewall Ballard wrote:
The test code is parked at http://sb.org/ThreadTest.zip, 20KB. It's an XCode 3.2 project, but the five source files could be readily compiled and run in any Unix environment.
I admit it took several times of reading this thread until I finally thought I got it (which might have something to do with several rather long days of PHP & JavaScript hacking). Apart from the original problem, I wonder if you could shed some light on a comment in your WorkQueue code: CONTROL: void increment() { boost::mutex::scoped_lock lock(the_mutex); ++tasks_available; lock.unlock(); // notify_one may fail to wake a worker if there // are multiple items in the queue, so it's better // to waste a bit of CPU and notify_all the_condition_variable.notify_all(); } WORKER: void wait_and_decrement() { boost::mutex::scoped_lock lock(the_mutex); while ( tasks_available == 0 ) { the_condition_variable.wait(lock); } --tasks_available; } What would be a scenario in which notify_one would fail? I would assume that a problem with notify_one would occur only if you would do something like tasks_available += 42; instead of ++tasks_available; In such a case, notify_one would wake less threads than required. But how can that happen with notify_one being called for each and every task? Regards, Roland PS: Thank you for starting this thread (and Peter and Anthony for participating, of course). Very interesting, because I have written a different kind of threadpool a few days ago (no queues, but you can call a method addTask() which will hand the task to the next thread available (blocking for as long as all threads are working on previously added tasks)). The destructor calls interupt_all() with potentially many threads in wait(). I probably would have run into the same problems as you did when using it in production (8-core).

On Dec 1, 2009, at 4:07 PM, Roland Bock wrote:
Stonewall Ballard wrote:
The test code is parked at http://sb.org/ThreadTest.zip, 20KB. It's an XCode 3.2 project, but the five source files could be readily compiled and run in any Unix environment.
I admit it took several times of reading this thread until I finally thought I got it (which might have something to do with several rather long days of PHP & JavaScript hacking).
Apart from the original problem, I wonder if you could shed some light on a comment in your WorkQueue code:
CONTROL: void increment() { boost::mutex::scoped_lock lock(the_mutex); ++tasks_available; lock.unlock(); // notify_one may fail to wake a worker if there // are multiple items in the queue, so it's better // to waste a bit of CPU and notify_all the_condition_variable.notify_all(); }
WORKER: void wait_and_decrement() { boost::mutex::scoped_lock lock(the_mutex); while ( tasks_available == 0 ) { the_condition_variable.wait(lock); } --tasks_available; }
What would be a scenario in which notify_one would fail? I would assume that a problem with notify_one would occur only if you would do something like
tasks_available += 42;
instead of
++tasks_available;
In such a case, notify_one would wake less threads than required. But how can that happen with notify_one being called for each and every task?
Take a look at this: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-... which is where I got original design for the WorkQueue I'm using in my app. There is a comment and reply on that page:
Okay, I think I have found an issue when multiple consumers use the queue. Assume all consumers are waiting for new data to be pushed onto the queue. When the producer then pushes multiple items in short succession, i.e. so quick that the first consumer to wake up cannot empty the queue again, then the_condition_variable.notify_one() is only called once (because it is blocked by the 'was_empty if' later). It seems to work for me if I replace notify_one() by notify_all().
Btw, I hope that all consumer threads waking up at the same is not a problem, but as far as I understand the notification mechanism, only one of them will acquire control over the mutex...
Let me know what you think... by David at 23:14:37 on Sunday, 07 September 2008 Hi David,
You're right. Thanks for spotting that. I guess my testing was not exhaustive enough :-(
The only impact of waking all the consumers is that they consume CPU time: if there's nothing in the queue they just treat it as a spurious wake and go back to sleep.
by Anthony Williams at 08:00:20 on Monday, 08 September 2008
I added the comment to the code in case someone came along later and decided that it was wasteful to notify_all().
PS: Thank you for starting this thread (and Peter and Anthony for participating, of course). Very interesting, because I have written a different kind of threadpool a few days ago (no queues, but you can call a method addTask() which will hand the task to the next thread available (blocking for as long as all threads are working on previously added tasks)). The destructor calls interupt_all() with potentially many threads in wait(). I probably would have run into the same problems as you did when using it in production (8-core).
This is the first time I've used boost::thread, so I guess that code is fairly new and not sufficiently beaten-upon. This discussion was very enlightening to me too. I greatly appreciate having experienced people available to look at the problem. My code is running in the idle loop of a dialog window, so I had to avoid blocking the producer for more than a moment. Since there could be many more available tasks than threads, a queue was the obvious choice. - Stoney -- Stonewall Ballard stoney@sb.org http://stoney.sb.org/

Stonewall Ballard wrote: [...]
In such a case, notify_one would wake less threads than required. But how can that happen with notify_one being called for each and every task?
Take a look at this: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-... [...] I added the comment to the code in case someone came along later and decided that it was wasteful to notify_all().
Hmm. Have you read the section "UPDATE" on the page? To me that says that the requirement for the notify_all has already been taken care of in the presented code (by calling it each time a task is pushed).
PS: Thank you for starting this thread (and Peter and Anthony for participating, of course). Very interesting, because I have written a different kind of threadpool a few days ago (no queues, but you can call a method addTask() which will hand the task to the next thread available (blocking for as long as all threads are working on previously added tasks)). The destructor calls interupt_all() with potentially many threads in wait(). I probably would have run into the same problems as you did when using it in production (8-core).
This is the first time I've used boost::thread, so I guess that code is fairly new and not sufficiently beaten-upon. This discussion was very enlightening to me too. I greatly appreciate having experienced people available to look at the problem.
My code is running in the idle loop of a dialog window, so I had to avoid blocking the producer for more than a moment. Since there could be many more available tasks than threads, a queue was the obvious choice.
Yes, a blocking dialog would be bad. I am using the threadpool for a merge process which can delegate the processing of chunks to its workers. At the time I wrote it, I thought that blocking would not be problem (and in the intended scenario it isn't), because I can rarely use all available threads due to the sheer size of the chunks. The merger blocks that anyway. But after today, I think I will change my pool. Limiting the number of tasks is part of the merge logic, the pool should not do something like that. Regards, Roland
participants (2)
-
Roland Bock
-
Stonewall Ballard