Le 09/03/12 04:32, John Rocha a écrit :
Hello,
I'm working on a multi-threaded application that uses boost threads. The threads are deployed in a cascading mechanism such as:
Main thread creates thread1
thread 1 creates threads 2, 3 and 4
thread 2 creates threads 5-14
The shutdown mechanism is to send a thread interrupt and then do a timed join to wait for the child thread[s] to finish their shutdown.
So Main sends a thread interrupt to thread1, and then waits for X seconds
thread 1 sends a thread interrupt to thread 2, and does a timed wait, then thread 3 and wait then thread 4 and wait
thread 2 does the same for each of its children: send a thread interrupt and then wait.
How and when thread1 starts the shutdown processing?
The problem is, that on rare occasions (1 out of 474 attempts in my last test cycle), thread 1 will return early from the timed_join, and it returns true, indicating the child thread is dead -- but it's not.
I have timed logging that shows when a specific threads, shutdown activities start and stop, and I can see that thread 1 isn't waiting 10 seconds for the child to exit, and I can see the child is still running.
Any suggestions for this? The only thing I can think of right now is that maybe thread1 is getting an interrupt for something that is causing it to leave it's timed_join early? I haven't looked into the boost code for this yet. I'm hoping that this is something others have encountered and already solved?
Or maybe other debugging tips could be provided? Could you post a simple example? This would help to better analyze the issue.
Best, Vicente