thread_group::interrupt_all is not reliable

30 Nov 2009

      I've discovered that under circumstances apparently related to timing
and load, sending interrupt_all to a thread_group when all the threads
are waiting on a boost::condition_variable leaves one thread waiting
about 1/3 of the time. This is with boost 1_40_0 running on Mac OS X
10.6.2, with 32-bit boost libraries. Boost uses the posix thread
system here.

I boiled my app down to some test code that runs as a command-line
app. It's a bit longer than I'd like, but this configuration seems to
be necessary to invoke the problem. The test uses a queue to pass
"tasks" from the main thread to worker threads, and another queue to
pass "results" back to the main thread. The problem is most apparent
when all the tasks are finished and the queue empties, so that all the
worker threads are waiting on the input queue when the main thread
sends interrupt_all.

I've looked at the waiting thread in a debugger when this happens, and
found that it has been interrupted, but is still waiting on the
condition. It looks like it just got missed by the interrupt_all. This
is more likely to happen when there are a lot of worker threads (16,
or one per core in my testing).

The test code is parked at <http://sb.org/ThreadTest.zip>, 20KB. It's
an XCode 3.2 project, but the five source files could be readily
compiled and run in any Unix environment.

I don't see any errors in the code that could cause these failures.
There is a work-around, which is to interrupt the waiting thread
again. This required a modified version of thread_group so I could do
a timed_join_all on it.

I welcome any suggestions about what could be wrong here, or ways to
simplify the test to make it more suitable for a bug report.

- Stoney

-- 
Stonewall Ballard 
stoney@sb.org           http://stoney.sb.org/

Stonewall Ballard

Roland Bock

Stonewall Ballard

Roland Bock

tags

participants (2)