condition notify_all problem
I've just started using the boost libraries, threads so far -- great libraries. I have a test app that uses thread groups of producers and consumers working with a queue. It seems simple enough, but I'm encountering a deadlock problem using notify_all() vs looping with notify_one() when ending the consumer threads. I'm not sure my implementation is correct though (link to source below). I can reliably cause deadlocks on all Windows XP systems I've tried (single, dual and quad machines) within about a minute of execution. However, I only had one deadlock on Windows 2000 (single or quad) in over 6 hours of execution and still going. Attaching or running the executable in the VS7 debugger shows all the threads and the main thread stopped in ntdll.dll via kernel32.dll. The stack shows the last user code executed by wmain was boost::thread_group::join_all from ctg.join_all(). The threads last ran boost::condition::do_wait before entering the kernel. Swapping lines 217-218 for 222-227 appears to fix the deadlock by looping with notify_one(). I would guess that something fundamental is wrong with my implementation, and my fix just changes the timing. Any help is appreciated. The source is posted at the link below. I can provide further information if necessary. Thanks, Joshua Boelter www.boelter.org/data/t_threadgroup.cpp Using Visual Studio 7.0.9466, stlport 4.5.3 (dll), boost libraries 1.29.0 on various WinXP Pro and Win2k systems.
Mike Wilson provided me with a concise example the reproduces the problem on a wider variety of systems. I uploaded slightly modified version of that source to the group files. I'm ready to blame it on the OS :) If anyone else has any insight, it's appreciated. Thanks, Joshua Boelter http://groups.yahoo.com/group/Boost- Users/files/condition/notifyall.cpp
joshuaboelter said:
Mike Wilson provided me with a concise example the reproduces the problem on a wider variety of systems. I uploaded slightly modified version of that source to the group files. I'm ready to blame it on the OS :) If anyone else has any insight, it's appreciated.
Sorry I've been so long in responding. I'm a tad swamped right now. I did look at your original code, and though there were a few things I thought might be tweaked, nothing was glaringly wrong. You may well have found an error in the library, but I need a lot more time to evaluate this. The boost::condition is the most complex implementation on Win32 of all of the Boost.Threads code. It's a very tricky thing to emulate this concept correctly, unfortunately. I don't think I'd be too quick to blame the problem on the OS, since the OS doesn't have conditions. It's much more likely that the problem is either in the test code, or in the Boost.Threads library. Give me a few days to evaluate this one thoroughly. Oh, and thanks for the other example as well. I haven't looked at it yet, but having two examples is likely to make it easier to track down the issue. William E. Kempf
participants (2)
-
joshuaboelter
-
William E. Kempf