[smart_ptr] Getting segfaults with weak_ptr<T>::lock()

I've run into a problem where a weak_ptr<> is segfaulting when I try to do a lock() call on it. It doesn't happen all the time, and I'm at a loss to see what's causing the problem. I have an implementation of the Observer pattern in my code using shared_ptr/weak_ptr: class Observer { public: virtual void update(shared_ptr<Observed> d) = 0; ... } class Observed : public enable_shared_from_this<Observed> { vector<weak_ptr<Observer> > _observers; public: void register(shared_ptr<Observer> o) { _observers.push_back(weak_ptr<Observer>(o)); } void notifyObservers() { vector<weak_ptr<Observer> >::iterator i; for (i = _observers.begin(); i != _observers.end(); observer++) { shared_ptr<Observer> o = i->lock(); if (o) o->update(shared_from_this()); } } ... } I'm trying to write a wxWidget that participates in the above Observer scheme, but quickly ran into a problem: wxWidgets passes around raw pointers and handles its own garbage-collection. If I created a shared_ptr<MyWidget>, wx would feel perfectly free to delete the object out from under the shared_ptr<>, which would be bad. My solution was: template <typename T> class wxObserver : public Observer { private: T* p; wxObserver(); public: wxObserver(T* p) : p(p); void update(shared_ptr<Observed> o) { if (p) p->update(o); } } class MyWidget : public wxWidget { private: shared_ptr<wxObserver<MyWidget> > observer; public: MyWidget(...) : wxWidget(...), observer(this) { ... observed->register(observer); ... } ... } The idea is that when a MyWidget is created, it creates a shared pointer to a wxObserver and can register that wxObserver with observed objects without a problem. When wxWidgets chooses to delete the MyWidget, the shared pointer the MyWidget goes away, the wxObserver is deleted, and the weak pointers held by the observed objects expire. The implementation of Observed::notifyObservers() checks to make sure that the observers still exist before update()ing them, and only the live observers are updated. The problem is that sometimes, after wxWidgets destroys a MyWidget, calls to Observed::notifyObservers() crashes with a segfault within the call to i->lock(). I can't see what's causing this. Worse, while it crashes consistantly and repeatedly at the same line, it doesn't appear to always crash after the same amount of work. Putting various {cerr << "got here";}-type lines within the code above doesn't always yield exactly the same results from run to run, even without recompiling. My thoughts are now that it might be a threading issue. Is there any good solutions to this problem? thanks, Buddha

Buddha Buck wrote: ...
My solution was:
template <typename T> class wxObserver : public Observer { private: T* p; wxObserver(); public: wxObserver(T* p) : p(p); void update(shared_ptr<Observed> o) { if (p) p->update(o); } }
class MyWidget : public wxWidget { private: shared_ptr<wxObserver<MyWidget> > observer; public: MyWidget(...) : wxWidget(...), observer(this) { ... observed->register(observer); ... } ... }
The idea is that when a MyWidget is created, it creates a shared pointer to a wxObserver and can register that wxObserver with observed objects without a problem. When wxWidgets chooses to delete the MyWidget, the shared pointer the MyWidget goes away, the wxObserver is deleted, and the weak pointers held by the observed objects expire. The implementation of Observed::notifyObservers() checks to make sure that the observers still exist before update()ing them, and only the live observers are updated.
The problem is that sometimes, after wxWidgets destroys a MyWidget, calls to Observed::notifyObservers() crashes with a segfault within the call to i->lock().
I think I see one problem in your code, but it cannot explain a segfault within i->lock(). When MyWidget is destroyed, if at the exact same time an Observed locks its weak_ptr to the wxObserver, the T* p pointer will dangle. Can you post a stack trace? The problem you describe sounds like a race between notifyObservers and ~Observed or register, is that possible?

On 7/10/07, Peter Dimov <pdimov@mmltd.net> wrote:
Buddha Buck wrote: ...
My solution was:
template <typename T> class wxObserver : public Observer { private: T* p; wxObserver(); public: wxObserver(T* p) : p(p); void update(shared_ptr<Observed> o) { if (p) p->update(o); } }
class MyWidget : public wxWidget { private: shared_ptr<wxObserver<MyWidget> > observer; public: MyWidget(...) : wxWidget(...), observer(this) { ... observed->register(observer); ... } ... }
The idea is that when a MyWidget is created, it creates a shared pointer to a wxObserver and can register that wxObserver with observed objects without a problem. When wxWidgets chooses to delete the MyWidget, the shared pointer the MyWidget goes away, the wxObserver is deleted, and the weak pointers held by the observed objects expire. The implementation of Observed::notifyObservers() checks to make sure that the observers still exist before update()ing them, and only the live observers are updated.
The problem is that sometimes, after wxWidgets destroys a MyWidget, calls to Observed::notifyObservers() crashes with a segfault within the call to i->lock().
I think I see one problem in your code, but it cannot explain a segfault within i->lock(). When MyWidget is destroyed, if at the exact same time an Observed locks its weak_ptr to the wxObserver, the T* p pointer will dangle.
I can see that, and I don't see an easy solution. I tried modifying ~MyWidget so that it explicitly set the T* p pointer to NULL, but that didn't change the behavior.
Can you post a stack trace? The problem you describe sounds like a race between notifyObservers and ~Observed or register, is that possible?
A stack trace is below this paragraph. It seems impossible for there to be a race between anything and ~Observed, since the Observed in this case (an instance of the ObjectDB class) is a long-lasting object that shouldn't be deleted until the end of the application. gdb shows only one thread in the application, which makes race conditions seem even more unlikely to me. (One note: my Observed class is really called "Subject", which is why the trace below includes Subject::notifyObservers() and not Observed::notifyObservers(). #0 0x000000000041c711 in boost::detail::atomic_conditional_increment (pw=0x2b3ba59cdae8) at vendor/include/boost/detail/sp_counted_base_gcc_x86.hpp:92 #1 0x000000000041c743 in boost::detail::sp_counted_base::add_ref_lock (this=0x2b3ba59cdae0) at vendor/include/boost/detail/sp_counted_base_gcc_x86.hpp:138 #2 0x000000000041d7a1 in shared_count (this=0x7fff083b7678, r=@0x70f4a78) at vendor/include/boost/detail/shared_count.hpp:362 #3 0x00002b3ba3bcb6ff in shared_ptr<Observer> (this=0x7fff083b7670, r=@0x70f4a70) at vendor/include/boost/shared_ptr.hpp:187 #4 0x00002b3ba3bcb745 in boost::weak_ptr<Observer>::lock (this=0x70f4a70) at vendor/include/boost/weak_ptr.hpp:106 #5 0x00002b3ba3bcac9b in Subject::notifyObservers (this=0x5a91e0) at src/core/Observer.cpp:40 #6 0x00002b3ba3bc320d in ObjectDB::createObject (this=0x5a91e0, parent=@0x7fff083b7900, type=@0x7fff083b78f0, label=@0x7fff083b78e0, _guid=@0x7fff083b78d0) at src/core/ObjectDB.cpp:101
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Buddha Buck wrote:
#0 0x000000000041c711 in boost::detail::atomic_conditional_increment (pw=0x2b3ba59cdae8) at vendor/include/boost/detail/sp_counted_base_gcc_x86.hpp:92 #1 0x000000000041c743 in boost::detail::sp_counted_base::add_ref_lock (this=0x2b3ba59cdae0) at vendor/include/boost/detail/sp_counted_base_gcc_x86.hpp:138 #2 0x000000000041d7a1 in shared_count (this=0x7fff083b7678, r=@0x70f4a78) at vendor/include/boost/detail/shared_count.hpp:362 #3 0x00002b3ba3bcb6ff in shared_ptr<Observer> (this=0x7fff083b7670, r=@0x70f4a70) at vendor/include/boost/shared_ptr.hpp:187 #4 0x00002b3ba3bcb745 in boost::weak_ptr<Observer>::lock (this=0x70f4a70) at vendor/include/boost/weak_ptr.hpp:106 #5 0x00002b3ba3bcac9b in Subject::notifyObservers (this=0x5a91e0) at src/core/Observer.cpp:40 #6 0x00002b3ba3bc320d in ObjectDB::createObject (this=0x5a91e0, parent=@0x7fff083b7900, type=@0x7fff083b78f0, label=@0x7fff083b78e0, _guid=@0x7fff083b78d0) at src/core/ObjectDB.cpp:101
Very odd address space layout; the shared_ptr code at 0x00002b3ba3bcb6ff and the shared_count/sp_counted_base code at 0x000000000041c711. The 'this' pointer for the sp_counted_base object - 0x2b3ba59cdae0 - seems suspicious, too close to code, too far from the this pointer for the weak_ptr (0x70f4a70) which should also be on the heap. Hard to say where things go wrong. Can you post the assembly around the faulting location? Might it be possible for the two code segments, 2b3ba3.. and 41c.., to be coming from two separate versions of Boost?

On 7/11/07, Peter Dimov <pdimov@mmltd.net> wrote:
Very odd address space layout; the shared_ptr code at 0x00002b3ba3bcb6ff and the shared_count/sp_counted_base code at 0x000000000041c711. The 'this' pointer for the sp_counted_base object - 0x2b3ba59cdae0 - seems suspicious, too close to code, too far from the this pointer for the weak_ptr (0x70f4a70) which should also be on the heap. Hard to say where things go wrong. Can you post the assembly around the faulting location? Might it be possible for the two code segments, 2b3ba3.. and 41c.., to be coming from two separate versions of Boost?
I found the problem, and it is such a stupid cause I hesitate to mention it here for fear of looking foolish. If you recall, my notifyObservers() function looked like: void Observed::notifyObservers() { vector<weak_ptr<Observer> >::iterator i; for (i = _observers.begin(); i != _observers.end(); observer++) { shared_ptr<Observer> o = i->lock(); if (o) o->update(shared_from_this()); } } What I didn't realize at the time was that one of the actions an observer could (and indeed, does) do is to create new observers for the observed object. The practical result being that while in the loop, the vector _observers could (and indeed was) modified, yielding the possibility of an invalid iterator. It's not an issue with boost at all, but a problem where I didn't respect the limitations of std::vector<>. I'm sorry to have bothered you with a simple case of PEBKAC.
participants (2)
-
Buddha Buck
-
Peter Dimov