[multi-index] Container not safe against reentrant clear() / erase() ?
Hi,
I have encounted a scenario that triggers an access violation when
removing reference counted objects from a container.
Assume I have a (for simplicity's sake, global) multi_index_container
g_multiindex containing reference counted objects of type CObject:
boost::multi_index_container< ordered_unique< shared_ptr<CObject> > >
g_multiindex;
In general, an object of type CObject may be destroyed at any time and
the destructor makes sure to remove any reference held in g_multiindex:
class CObject {
~CObject() {
// if object is explicitly destroyed, remove from container
g_multiindex.erase( this );
}
};
Since ownership of CObject is shared, g_multiindex may hold the last
reference to a CObject instance.
g_multiindex.clear then calls
- delete_all_nodes()
- delete_all_nodes(root())
- this->final_delete_node_(static_cast
Sebastian Theophil escribió:
Hi,
I have encounted a scenario that triggers an access violation when removing reference counted objects from a container.
Assume I have a (for simplicity's sake, global) multi_index_container g_multiindex containing reference counted objects of type CObject:
boost::multi_index_container< ordered_unique< shared_ptr<CObject> > > g_multiindex;
In general, an object of type CObject may be destroyed at any time and the destructor makes sure to remove any reference held in g_multiindex:
class CObject {
~CObject() {
// if object is explicitly destroyed, remove from container
g_multiindex.erase( this );
}
};
If I'm understanding this right, I think your design is flawed. You say that CObjects detach themselves from g_multiindex when their refcount drops to 0, but: how can this count be zero if the CObject is still referenced from g_multiindex? The only situation where this can happen is when destroying g_multiindex, precisely where the crash occurs. Simply omitting the g_multiindex.erase invocation inside ~CObject should fix things. Am I wrong somewhere?
[...]
This would remove all elements from the container before actually deleting them. I'm unsure though how the same behavior could be achieved when erasing a single element.
The C++ standard does not seem to define how clear() and erase() should behave in this regard, but I cannot image that my application scenario is so unusual that this problem has not been solved a million times. So I'm curious if the behavior of multi_index_containers is by design and maybe someone could be so kind to enlighten me why this is so. Is there any better way to achieve what I'm trying to do?
This is simply an uncovered situation, and, IMHO, a very unusual one --in support of my view I can say I never got a report on this, and the lib's been around for 4.5 years. On the other hand, erasing inside erase() does work; presumably this is much more frequent, for instance when working with composite and tree-like data types. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
Hi Joaquin, thanks for your prompt reply.
If I'm understanding this right, I think your design is flawed. You
say
that CObjects detach themselves from g_multiindex when their refcount drops to 0, but: how can this count be zero if the CObject is still referenced from g_multiindex?
It seems I haven't expressed myself very clearly. Of course, the refcount only drops to zero when g_multiindex is cleared. But assume that a CObject can also be explicitly destroyed at any time. Obviously, before destroying the CObject, the caller could explicitly remove the CObject from the multi_index_container. This would need to be done whenever a CObject is destroyed and sooner or later someone will forget to do it. It seems much simpler to let the CObject dtor do the work of removing all references to itself, as long as the container is safe against "double deletion".
This is simply an uncovered situation, and, IMHO, a very unusual one --in support of my view I can say I never got a report on this, and the lib's been around for 4.5 years.
I can't argue against that :-)
On the other hand, erasing inside erase() does work;
What I do is erase() -> ~node() -> erase() and I would guess that doesn't work for the same reason as before: If the container has n elements, node deletion goes through the following states delete_node( node* x ) { // container has n nodes destroy(x) // container still has n nodes but x has been destroyed reset left / right / parent pointers // container has n-1 nodes } Or is that wrong? What I'd like is something like delete_node( node* x ) { reset left / right / parent pointers // container has n-1 nodes, x has not been destructed yet destroy(x) } In this implementation, the multi_index_container is never in an invalid state. In the current implementation, elements contained in a container cannot reliably make operations on the same container inside their destructor. The dtor may e.g. destroy the internal data structures used to order the nodes. What if one node's dtor would like to remove a set of dependent elements from the container as well? Regards, Sebastian -- Sebastian Theophil . stheophil@think-cell.com Software Engineer think-cell Software GmbH . Invalidenstr. 34 . 10115 Berlin, Germany http://www.think-cell.com . phone +49-30-666473-10 . toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl . Amtsgericht Berlin-Charlottenburg, HRB 85229
On Wed, Jun 3, 2009 at 6:08 PM, Sebastian Theophil wrote: This is simply an uncovered situation, and, IMHO, a very unusual one
--in support
of my view I can say I never got a report on this, and the lib's been
around for
4.5 years. I can't argue against that :-) On the other hand, erasing inside erase() does work; What I do is erase() -> ~node() -> erase() and I would guess that
doesn't work for the same reason as before:
If the container has n elements, node deletion goes through the
following states Sebastian, IMO you should reconsider the design, since you introduced a
cyclic dependency in your code.
Even if you fix that, this code will remain fragile and will likely break
soon on some other place.
May be my post is not very productive, but to complex logical constructs
lead to fragile code. Try to choose one
path on how it might work and implement this contract only. Don't try to
decide to much for your users especially
if your code is a low level abstraction which does not have the information
on how it might be used. You will not
be able to cover all the use cases by writing intelligent code. You can only
restrict your code to work in proposed manner
and no more. Try to apply SRP to the CObject class. What happens if someone
put's your CObject to a local vector instead
of global MI-container, this will still try to delete itself from the global
MI-Container?
I definitely know very little about the whole app, to make such critics and
apologize for that. It is also not the usual in this mailing
list and might even be against the policy of that list.
With Kind Regards,
Ovanes
Sebastian Theophil wrote:
If I'm understanding this right, I think your design is flawed. You say that CObjects detach themselves from g_multiindex when their refcount drops to 0, but: how can this count be zero if the CObject is still referenced from g_multiindex?
It seems I haven't expressed myself very clearly. Of course, the refcount only drops to zero when g_multiindex is cleared. But assume that a CObject can also be explicitly destroyed at any time.
That's what I find hard to assume: how is it permissible to destroy a CObject at any time when its refcount is not zero? If you have some object whose lifetime is managed by boost::shared_ptr, I think it is a bad idea to mess with the object lifetime directly. Please note this is not a rhetoric question: how do you know when you explicitly destroy a CObject that you leave no dangling boost::shared_ptrs to it?
On the other hand, erasing inside erase() does work;
What I do is erase() -> ~node() -> erase() and I would guess that doesn't work for the same reason as before[...] If the container has n elements, node deletion goes through the following states
delete_node( node* x ) { // container has n nodes destroy(x) // container still has n nodes but x has been destroyed reset left / right / parent pointers // container has n-1 nodes }
Or is that wrong?
What I'd like is something like
delete_node( node* x ) { reset left / right / parent pointers // container has n-1 nodes, x has not been destructed yet destroy(x) }
The latter is the way the lib currently works. Are you experiencing otherwise? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo -- View this message in context: http://www.nabble.com/-multi-index--Container-not-safe-against-reentrant-cle... Sent from the Boost - Users mailing list archive at Nabble.com.
That's what I find hard to assume: how is it permissible to destroy a CObject at any time when its refcount is not zero? If you have some object whose lifetime is managed by boost::shared_ptr, I think it is a bad idea to mess with the object lifetime directly.
Please note this is not a rhetoric question: how do you know when you explicitly destroy a CObject that you leave no dangling boost::shared_ptrs to it?
We do not use the boost::shared_ptr class and its relatives but our own reference counting implementation. There are two kinds of references, owned_ref<T> represents an "owning" relationship between a parent and a child. If such a reference is destroyed, it deletes the object it points to. ref<T> is a simple reference that increases the refcount of T. If the last ref of either type is destroyed, the object is always destroyed. Some objects are kept alive only by refs, other objects are owned and their lifetime is bound to the lifetime of their parent. We reference count all objects but this doesn't necessarily imply that every reference should keep an object alive. It only implies that every such reference must be removed when the object is destroyed. After all dtors have run, the refcount must be zero and this is asserted. It doesn't have to be zero when the dtor is called. The object itself manages its lifetime, not the references to it. Imagine you have a hierarchical application data model e.g. CPresentation |_ CSlide |_ CChart |_ CDataLabel This application data model doesn't change very often. Changing the data model causes changes to the file format, the forward and backward conversion, possibly the user interface etc. Every object's lifetime and the collections in which it is held are very clearly defined and are known to the object itself. The question what if somebody uses CDataLabel in a totally unforeseen way doesn't really apply. Each parent owns his children. - The parent constructor constructs the child objects, the child objects may register in some collection higher up the chain. All elements with a user interface may register with some top level object that implements the mouse handling for example. - The parent destructor causes the destructions of its child objects. The child dtor deregisters the child object at all places where it registers in the ctor. In my opinion this is a simple structure. Object construction and destruction are inverse operations of each other, written side by side. Object registration and deregistration only happen at one place, so nobody can forget to do this in the right order, or at the right place etc. Now there is one additional use case: An object like CDataLabel may be destroyed at any time because the user hit delete or pressed a button, or called a macro function. This is forwarded to the corresponding user interface object which simply deletes its owner, ie the data label itself. The user interface class could ask the parent to delete a label of course, but this becomes impractical if there are different labels with different parents but the same user interaction. Again, if each (specialized) CDataLabel dtor can remove all references to itself, no caller can forget to do it, the code remains pretty simple, and the object tree is only managed at one place. The CDataLabel dtor removes the object from the class hierarchy described above, i.e., the dtor also deregisters the object from its parent. When the parent itself is destroyed, this deregistration is a noop because the data label reference is already destroyed.
The latter is the way the lib currently works. Are you experiencing otherwise?
If the CChart has a multi index container of owning references to data labels, the problem I've described could result. If the multi_index_container is cleared and each reference is released, the CDataLabel is destroyed. During clear() you would have a stack trace like this: multi_index_container::clear() - delete_node(root()) - destroy(node*) - ~owned_ref< CDataLabel > - ~CDataLabel -> and at this point the half-destructed reference is still part of the multi_index_container. Because it is half-destructed, the ordering criterion of the index may be violated and the CDataLabel cannot make a lookup on the container, neither to make sure it is removed itself, nor to remove e.g. dependent labels. Lookups may crash in fact because the index assumes an ordering relation between the left/right/parent nodes and runs off into uninitialized memory. Regards, Sebastian -- Sebastian Theophil · stheophil@think-cell.com Software Engineer think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Berlin-Charlottenburg, HRB 85229
Sebastian Theophil wrote:
Please note this is not a rhetoric question: how do you know when you explicitly destroy a CObject that you leave no dangling boost::shared_ptrs to it?
[long explanation snipped] OK, thank you very much for the explanation. Sorry about having kept challenging your design, but I really couldn't see how you were in the situation you are. Now it's clear to me.
If the CChart has a multi index container of owning references to data labels, the problem I've described could result. If the multi_index_container is cleared and each reference is released, the CDataLabel is destroyed. During clear() you would have a stack trace like this:
multi_index_container::clear() - delete_node(root()) - destroy(node*) - ~owned_ref< CDataLabel > - ~CDataLabel -> and at this point the half-destructed reference is still part of the multi_index_container. Because it is half-destructed, the ordering criterion of the index may be violated and the CDataLabel cannot make a lookup on the container[...]
Yes, as you said and I confirmed erase() inside clear() does not work. It is not easy to make it work without greatly affecting the performance of clear(): the implementation of the latter takes advantage of the fact that all nodes must be deleted at once to do it in the most straightforward way, without bothering to keep the internal data structures consistent during the process. But you can circumvent this simply by using erase(begin(),end()) instead of clear(), because erase() is reentrant. To automate this, you can have your custom container deriving from multi_index_container defined as this: class my_container:public multi_index_container<...> { ~my_container() { erase(begin(),end()); } } Does this help? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo -- View this message in context: http://www.nabble.com/-multi-index--Container-not-safe-against-reentrant-cle... Sent from the Boost - Users mailing list archive at Nabble.com.
[long explanation snipped]
OK, thank you very much for the explanation. Sorry about having kept challenging your design, but I really couldn't see how you were in the situation you are. Now it's clear to me.
I'm glad I convinced you :-)
But you can circumvent this simply by using
erase(begin(),end())
Thanks for the information, I can live with that. How big is the performance impact when using erase(begin(), end()) vs clear()? Now I guess I'm challenging your design ;-) My only point is that in the few cases where the C++ standard defines the behavior of clear() for the STL containers, it simply says clear() should be equivalent to erase(begin(), end()). It is a bit surprising if the multi_index_container behaves differently in that regard. Thanks for your patience Sebastian -- Sebastian Theophil · stheophil@think-cell.com Software Engineer think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Berlin-Charlottenburg, HRB 85229
Sebastian Theophil escribió:
But you can circumvent this simply by using
erase(begin(),end())
Thanks for the information, I can live with that. How big is the performance impact when using erase(begin(), end()) vs clear()? Now I guess I'm challenging your design ;-)
Well, you can just profile it. I can't give you exact figures right off the top of my head, but the improvement was huge. Not only the speedup is big in practical terms, it also affects to big-O complexity (linear with the smart clear() implementation as opposed to O(nlogn) for erase(begin(),end())).
My only point is that in the few cases where the C++ standard defines the behavior of clear() for the STL containers, it simply says clear() should be equivalent to erase(begin(), end()). It is a bit surprising if the multi_index_container behaves differently in that regard.
The equivalence stated by the standard refers only to external behavior and big-O complexity, and implementations are free to improve on that if they can. In fact, any serious stdlib implementation already does: open your favorite environment's <set> header, for instance, and if you dig a little you'll see that clear() is not implemented as erase(begin(),end()), but usually the removal is done recursively from the root node. This is the same as ordered indices in Boost.MultiIndex do. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (4)
-
Joaquin M Lopez Munoz
-
joaquin@tid.es
-
Ovanes Markarian
-
Sebastian Theophil