in IOS, thread_info object is being destructed before the thread finishes executing
data:image/s3,"s3://crabby-images/8fbe5/8fbe56dbf42fe45cbf8bb1f1c7207972ee2afd40" alt=""
Our project uses a few boost 1.48 libraries on several platforms, including Windows, Mac, Android, and IOS. We are able to consistently get the IOS version of the project to crash (nontrivially but reliably) when using IOS, and from our investigation we see that ~thread_data_base is being called on the thread's thread_info while its thread is still running. This seems to happen as a result of the smart pointer reaching a zero count, even though it is obviously still in scope in the thread_proxy function which creates it and runs the requested function in the thread. This seems to happen in various cases - the call stack is not identical between crashes, though there are a few variations which are common. Just to be clear - this often requires running code which is creating hundreds of threads, though there are never more than about 30 running simultaneously. I have "been lucky" and got it very very early in the run also, but that's rare. I created a version of the destructor which actually catches the code red-handed: in libs/thread/src/pthread/thread.cpp: thread_data_base::~thread_data_base() { boost::detail::thread_data_base* const thread_info=detail::get_current_thread_data(); void *void_thread_info = (void *) thread_info; void *void_this = (void *) this; // is somebody destructing the thread_data other than its own thread? // (remember that its own which should no longer point to it anyway, // because of the call to detail::set_current_thread_data(0) in thread_proxy) if (void_thread_info) { // == void_this) { __builtin_trap(); } } I should note that (as seen from the commented-out code) I had previously checked to see that void_thread_info == void_this because I was only checking for the case where the thread's current thread_info was killing itself. I have also seen cases where the value returned by get_current_thread_data is non-zero and different from "this", which is really weird. Also when I first wrote that version of the code, I wrote: if (((void*)thread_info) == ((void*)this)) and at run-time I got some very weird exception that said I something about a virtual function table or something like that - I don't remember. I decided that it was trying to call "==" for this object type and was unhappy with that, so I rewrote as above, putting the conversions to void * as separate lines of code. That in itself is quite suspicious to me. I am not one to run to rush to blame compilers, but... I should also note that when we did catch this happening the trap, we saw the destructor for ~shared_count appear twice consecutively on the stack in Xcode source. Very doubleweird. We tried to look at the disassembly, but couldn't make much out of it. Again - it looks like this is always a result of the shared_count which seems to be owned by the shared_ptr which owns the thread_info reaching zero too early. Help! Thanks, Andy Weinstein PS I cannot say for sure yet whether this problem occurs on the other platforms, though it looks like Windows is more solid that Mac. Android pretty solid also, though maybe in between Windows and Mac.
data:image/s3,"s3://crabby-images/0425d/0425d767771932af098628cd72e2ccd4040cb8a0" alt=""
On Mon, Feb 4, 2013 at 7:38 AM, Andy Weinstein
This seems to happen as a result of the smart pointer reaching a zero count, even though it is obviously still in scope in the thread_proxy function which creates it and runs the requested function in the thread.
I feel obliged to mention a likely scenario in which a shared_ptr's count can reach zero prematurely, namely when multiple shared_ptrs are created from the same raw (native) pointer, e.g. Object* myobj = new Object(); shared_ptr<Object> sp1(myobj); shared_ptr<Object> sp2(myobj); sp1's count will be 1, as will sp2's. When either shared_ptr's count goes to zero, myobj will be destroyed, even if the other shared_ptr is still "live."
I should also note that when we did catch this happening the trap, we saw the destructor for ~shared_count appear twice consecutively on the stack in Xcode source. Very doubleweird.
Multiple shared_count instances make me suspicious of a scenario like the above.
participants (2)
-
Andy Weinstein
-
Nat Linden