Crash in boost::thread::sleep(xt) (boost 1.42 on Windows 32 bit)

Hi all, we recognize in our tests, that one of our threads seem to crash in boost:thread::sleep(xt). Is there a known issue with the sleep() function under MS Visual Studio 2008, with boost 1.42? It was tested on Windows 7 and Windows XP 32 Bit OS. Our thread crashes also in the debugger, when all exceptions are on, without any notice. But only this one thread goes away, the debugger stop at a breakpoint, which was set at a place, where it was recognized that this thread no longer runs. The error occure, in a multithreaded environment, with at least around 30 active threads. The sleep is outsite any critical section (no mutex and/or no semaphore is active). Any hints are welcome. regards Arno

is there nobody who has some experience in this aerea? Our thread, what should run in an endless loop, crashes without any notice in the call: boost::this_thread::sleep(boost::posix_time::milliseconds(millisec_)); The visual studio 2008 debugger didn't stop in any exception, so what can cause this crash and how can I find out the reason? The problem is reproducible in our complex scenario. regards Arno

On Fri, Mar 23, 2012 at 8:43 AM, Arno
is there nobody who has some experience in this aerea? Our thread, what should run in an endless loop, crashes without any notice in the call: boost::this_thread::sleep(boost::posix_time::milliseconds(millisec_));
The visual studio 2008 debugger didn't stop in any exception, so what can cause this crash and how can I find out the reason? The problem is reproducible in our complex scenario.
regards Arno
Is this really a correct way of calling sleep? There are 2 overloads of sleep function: The one which defines the duration (how long to sleep) and the other the time-point until which the thread should at least sleep. I assume you are going to use the second one, therefore the call should be: boost::this_thread::sleep(boost::get_system_time()+boost::posix_time::milliseconds(millisec_)); Hope this is going to resolve your issue... I wonder if that is really intended that an implicit cast from milliseconds to system_time or duration object is really a good idea??? With Kind Regards, Ovanes

boost::this_thread::sleep(boost::get_system_time()+ boost::posix_time::milliseconds(millisec_)); Hi Ovanes,
I took my call from examples what I have found and use it at many places in our code and til now it seems to work correctly. Our problem is not related to the use of sleep, I look for reasons what can happen, that a thread stops without any known reason. Do you have some hints in this direction? regards Arno

On Mon, Mar 26, 2012 at 9:17 AM, Arno
boost::this_thread::sleep(boost::get_system_time()+ boost::posix_time::milliseconds(millisec_)); Hi Ovanes,
I took my call from examples what I have found and use it at many places in our code and til now it seems to work correctly. Our problem is not related to the use of sleep, I look for reasons what can happen, that a thread stops without any known reason.
Do you have some hints in this direction?
regards Arno
Arno, if that is not the reason, I would run it in the debugger and enable the debugger to stop whenever an exception is thrown or a signal is raised. Try to see if that happens when this one thread runs... Logs might help as well. Finally, does the program wait for the thread to exit or does it detaches the thread from the thread object? Is is possible that some dangling references or pointers are accessed in the thread (but than the debugger should receive an access violation signal). Can you reproduce a minimal example to be posted here. I know it might be hard to do when dealing with MT-contexts. Best Regards, Ovanes

if that is not the reason, I would run it in the debugger and enable the debugger to stop whenever an exception is thrown or a signal is raised.
That's exactly what I have done, but the problem is that the thread disapears without any notice, that means also if all exceptions are activated.
Can you reproduce a minimal example to be posted here. I know it might be hard to do when dealing with MT-contexts.
That's what I have written, it happen only in the whole context and is not reproducible in a smaler set. regards Arno

Sorry, that I did not re-read your post, I forget such things too fast ;)
On Mon, Mar 26, 2012 at 3:41 PM, Arno
if that is not the reason, I would run it in the debugger and enable the debugger to stop whenever an exception is thrown or a signal is raised.
That's exactly what I have done, but the problem is that the thread disapears without any notice, that means also if all exceptions are activated.
Ok, my suggestion would be: Create a dummy class with destructor and initialize it as a thread-local storage. Hopefully, the destructor is going to be called when the thread is terminated. Now either make some logging (from the dtor) or put a break-point into the destructor and see what is the context when the storage is destroyed.
Can you reproduce a minimal example to be posted here. I know it might be
hard to do when dealing with MT-contexts.
That's what I have written, it happen only in the whole context and is not reproducible in a smaler set.
regards Arno
Regards, Ovanes

On Mon, Mar 26, 2012 at 8:37 PM, Ovanes Markarian
Sorry, that I did not re-read your post, I forget such things too fast ;)
On Mon, Mar 26, 2012 at 3:41 PM, Arno
wrote: if that is not the reason, I would run it in the debugger and enable the debugger to stop whenever an exception is thrown or a signal is raised.
That's exactly what I have done, but the problem is that the thread disapears without any notice, that means also if all exceptions are activated.
Ok, my suggestion would be: Create a dummy class with destructor and initialize it as a thread-local storage. Hopefully, the destructor is going to be called when the thread is terminated. Now either make some logging (from the dtor) or put a break-point into the destructor and see what is the context when the storage is destroyed.
Can you reproduce a minimal example to be posted here. I know it might be
hard to do when dealing with MT-contexts.
That's what I have written, it happen only in the whole context and is not reproducible in a smaler set.
regards Arno
Regards, Ovanes
Additionally, I would try to use another compiler, e.g. VC 10. It is possible that there is a compiler bug, but more likely you might run in the threading error in a different way, which will give you more hints where the error might come from. Regards, Ovanes

Hi Ovanes, many thanks for that hint with the thread local storage, til now we didn't use it. I believe this is the thread_specifc_ptr stuff, isn't it? Changing the compiler isn't so easy, because of some third party components what we haven't licensed now for VC10. Do you read the comments from Neil, because he didn#T answer til now, do you now something about the fixed 'ODR violation' bug? best regards Arno

Arno,
my comments are below.
On Tue, Mar 27, 2012 at 8:59 AM, Arno
Hi Ovanes,
many thanks for that hint with the thread local storage, til now we didn't use it. I believe this is the thread_specifc_ptr stuff, isn't it?
Yes it is.
Changing the compiler isn't so easy, because of some third party components what we haven't licensed now for VC10.
MS offers a free license: Visual C++ Express Edition.
Do you read the comments from Neil, because he didn#T answer til now, do you now something about the fixed 'ODR violation' bug?
I don't know anything about this but, but open your executable/dll with Depends (http://www.dependencywalker.com/) and see if there are some symbols twice. Sure there will be a lot of symbols and you will need to shrink your search ;) Please also ensure that all your libs were properly re-compiled from clean step.
best regards Arno
Best Regards, Ovanes

On Tue, Mar 27, 2012 at 10:56:18AM +0200, Ovanes Markarian wrote:
On Tue, Mar 27, 2012 at 8:59 AM, Arno
wrote: Changing the compiler isn't so easy, because of some third party components what we haven't licensed now for VC10.
MS offers a free license: Visual C++ Express Edition.
As I understand it, it's not an issue of compiler licensing, but closed-source third party libraries.
Please also ensure that all your libs were properly re-compiled from clean step.
Not all libraries come with source. -- Lars Viklund | zao@acc.umu.se

On Tue, Mar 27, 2012 at 11:02 PM, Lars Viklund
On Tue, Mar 27, 2012 at 10:56:18AM +0200, Ovanes Markarian wrote:
On Tue, Mar 27, 2012 at 8:59 AM, Arno
wrote: Changing the compiler isn't so easy, because of some third party components what we haven't licensed now for VC10.
MS offers a free license: Visual C++ Express Edition.
As I understand it, it's not an issue of compiler licensing, but closed-source third party libraries.
Please also ensure that all your libs were properly re-compiled from clean step.
Not all libraries come with source.
Sorry, you are right... I misread it.

Hi Ovanes, we have fixed this problem now, at least it was a problem of different implemented concepts of handling faults in threads (with and without exceptions). Many thx for the discussion and your help. If you are interessted you can read more explainations in the answer to Neil in this thread. best regards Arno

On Fri, Mar 23, 2012 at 7:43 AM, Arno
is there nobody who has some experience in this aerea?
I have experience of odd crashes related to Boost.Thread on this version and earlier of Boost.
Our thread, what should run in an endless loop, crashes without any notice in the call: boost::this_thread::sleep(boost::posix_time::milliseconds(millisec_));
The visual studio 2008 debugger didn't stop in any exception, so what can cause this crash and how can I find out the reason? The problem is reproducible in our complex scenario.
There were some defects that I reported a little while ago relating ODR violations. The issue was very difficult, at least for me, to track down. It turned out that one of the third-party shared libraries I was using had used boost too. When a thread exited in the third party library it caused my application to die because the wrong function instance was executed. Anthony was very quick to put the corrections for this defect into Boost, but 1.42 is definitely too old to have this fix in.
regards Arno
Of course it could be something completely different. Neil Groves

There were some defects that I reported a little while ago relating ODR violations. The issue was very difficult, at least for me, to track down. It turned out that one of the third-party shared libraries I was using had used boost too. When a thread exited in the third party library it caused my application to die because the wrong function instance was executed.
Many thanks for that hint, we are using also some third party libs and they all work with threads, but don’t use boost. Can you give me a clue, what ODR violations can be relevant for this behaviour. It was very hard for us to reproduce our error also in the debug mode, but also the debugger don’t stop in a breakpoint if it happens, even when all exceptions are switched on. We didn't know the details at the moment what kind of threads will be used from the third party libs.
Anthony was very quick to put the corrections for this defect into Boost, but 1.42 is definitely too old to have this fix in.
At least what version was this fix done is there a bug number available? Regards Arno

On Mon, Mar 26, 2012 at 7:53 AM, Arno
boost too. When a thread exited in the third party library it caused my application to die because the wrong function instance was executed.
Many thanks for that hint, we are using also some third party libs and they all work with threads, but don’t use boost. Can you give me a clue, what ODR violations can be relevant for this behaviour. It was very hard for us to reproduce our error also in the debug mode, but also the debugger don’t stop in a breakpoint if it happens, even when all exceptions are switched on. We didn't know the details at the moment what kind of threads will be used from the third party libs.
In that case you might be in the same situation I was in. Our third-party libs used Boost.Thread statically linked. It took me quite a while before I realized this was a possible cause of the problem. I could see a completely messed-up context when catching the exceptions with the debugger.
Anthony was very quick to put the corrections for this defect into Boost, but 1.42 is definitely too old to have this fix in.
At least what version was this fix done is there a bug number available?
Sorry for missing this information out previously, I am having issues with Trac and could not find the thread of discussion with Anthony. I have finally found the threads of discussion and the changes were merged into Boost 1.47. Can you try Boost 1.47? If this does not work, then we can explore alternative approaches.
Regards Arno
I'm sorry for taking a long time to reply. I have been extremely busy with work. Regards, Neil Groves

In that case you might be in the same situation I was in. Our third-party libs used Boost.Thread statically linked. It took me quite a while before I realized
Hi Neil, many thx for your hints, it brought me on the right way. this was a possible cause of the problem. I could see a completely messed-up context when catching the exceptions with the debugger. Meanwhile I found out, that the third party libs didn't use boost, but heavily used win-threads, with thread pooling mechanisms. In our huge structure, I also found some code, what implements a kind of watchdog on a given thread id. There was looked for a somehow specified timeout and if the timeout became active the thread was directly terminated with the WINAPI function 'TerminateThread(...)', so boost thread can not recognize this, and so I have had the situation, that the thread sometimes crash in the sleep function or leave mutexes in an inconsistent state. So by the way, have you an example how is the right way to terminate a boost thread? I haven't understood the 'interuptable' things in the documentation.
Sorry for missing this information out previously, I am having issues with Trac and could not find the thread of discussion with Anthony. I have finally found the threads of discussion and the changes were merged into Boost 1.47.
So I believe it has nothing to do with the fix, but if you can found this communication with Anthony (perhaps he read this ;-), I am very interessted in this disscussion, because we have some more complicated problems in the multi threaded area, what can at least explained simalar to the ODR violation fix, if we know at least the circumstances of this fix.
Can you try Boost 1.47? If this does not work, then we can explore alternative approaches.
I have done a test with 1.49, but I can not switch to it completly, because of other issues with the filesystem library v3 at the moment, but this is another point. best regards Arno
participants (4)
-
Arno
-
Lars Viklund
-
Neil Groves
-
Ovanes Markarian