Re: [boost] [review] Heaps: size and stability

On 6/2/2011 at 19:27, Tim Blechmann wrote:
The stability issue relates to a previous discussion from a couple months ago (brought up by Cliff Green, http://article.gmane.org/gmane.comp.lib.boost.devel/219330). Cliff was concerned that the implementation of stability added overhead to the normally unstable heaps (i.e., an extra size_t, I think) from the implementation. I worry that the use of a policy parameter to specify stability will result in "accidental overhead" in cases where users don't fully understand the impact of their configuration choices.
it is not possible to write every heap algorithm as `stable' heap. i do not recall the exact point that will break stability if one just orders the nodes in the tree, but there was a non-trivial issue (possibly related to mutable heaps).
It should be possible to preserve the correctness of most operations (certainly find, insert, delete, merge, and rebalancing) for ordinary heaps by storing items with the same key value in leftist heap order, i.e. preserving the invariants that: (1) if a subtree contains multiple items with the same key as the subtree root, the item at the root is logically first in the order (i.e., standard heap ordering); (2) if both the left and right subtrees contain items with the same key as the subtree root, all items with that key in the left subtree are logically ordered before any item having the same key in the right subtree (leftist property); and (3) if a subtree contains multiple items with a particular key value K, every item with key value K has as its parent another item with key value K, with the possible exception of the subtree root (locality/compactness property) These properties are preserved by right- or left- rotation around the subtree root. To keep the heap at minimum height, you will also occasionally need a prune-and-splice operation (items with key value K in the right subtree become children of the rightmost node with key value K in the left subtree), followed by further rotations. The compactness property should allow find, insert, and delete to continue running in O(lg n) time when rebalancing is not required. The prune-and-splice operations add O(|K|), where |K| is the total number of nodes having key value K, to the worst-case rebalancing operation, but this |K| quite reasonably ~O(lg n) when the key values are sufficiently distributed. Some merge algorithms would clearly violate the invariants above, but stable merge is still possible in O(n) time.
iac the node versioning seems to be the usual approach for implementing stable heaps. the type of the version should be exposed to the interface, though, so one can specify arbitrary-sized integer types to avoid overflows ...
Versioning is the simplest way to adapt a non-stable heap implementation for stability, but the space overhead (and concomitant loss of flexibility in using a preexisting array to store the heap) and the risk of overflow make it unsuitable for some applications, including the quite common event queue. With a 32-bit counter value, a queue that receives and average of 100 events/second will overflow in about 16 months--- i.e. 32 bits is just enough space for your library to pass a reasonable test battery and still fail in the real world. (Of course, you need not increment the counter on every single insertion, as I just noted in the Code Collaborator review.) Another option would be chaining, where each heap node contains either a single item or a linked list of items with the same key value. This would minimize the required changes to existing algorithms, support fancier heap types and heaplike structures, and only impose some storage overhead (one or two pointers into the list, perhaps a length counter, and a means of synchronizing list access (which could be an atomic update protocol for the head/tail pointers and/or length counter)) when you actually need it, at the cost of some type erasure within the generic heap structure and possibly extra copying at runtime (when a singleton node first becomes a linked list.) On the subject of counters: On 6/2/2011 at 10:19, Tim Blechmann wrote:
will have a look ... in general boost.heap does not throw any exceptions by itself, but of course the held type, the container or the allocator can throw
Lines 231 & 232 of the review version of stable_heap.cpp are: if (counter_ == std::numeric_limits<std::size_t>::max()) BOOST_THROW_EXCEPTION(std::runtime_error("boost::heap counter overflow")); (to be fair, this does seem to be the only case in the library... but Thorsten is right that it should be clearly documented.)

it is not possible to write every heap algorithm as `stable' heap. i do not recall the exact point that will break stability if one just orders the nodes in the tree, but there was a non-trivial issue (possibly related to mutable heaps).
It should be possible to preserve the correctness of most operations (certainly find, insert, delete, merge, and rebalancing) for ordinary heaps by storing items with the same key value in leftist heap order, i.e. preserving the invariants that: (1) if a subtree contains multiple items with the same key as the subtree root, the item at the root is logically first in the order (i.e., standard heap ordering); (2) if both the left and right subtrees contain items with the same key as the subtree root, all items with that key in the left subtree are logically ordered before any item having the same key in the right subtree (leftist property); and (3) if a subtree contains multiple items with a particular key value K, every item with key value K has as its parent another item with key value K, with the possible exception of the subtree root (locality/compactness property)
how does this approach deal with mutabiliy? if we have multiple keys a, and change one key from a to b and later back to a, we loose any information about the original position. we could simply define this sequence as undefined or interpret `update' as `insert' ...
iac the node versioning seems to be the usual approach for implementing stable heaps. the type of the version should be exposed to the interface, though, so one can specify arbitrary-sized integer types to avoid overflows ...
Versioning is the simplest way to adapt a non-stable heap implementation for stability, but the space overhead (and concomitant loss of flexibility in using a preexisting array to store the heap) and the risk of overflow make it unsuitable for some applications, including the quite common event queue. With a 32-bit counter value, a queue that receives and average of 100 events/second will overflow in about 16 months--- i.e. 32 bits is just enough space for your library to pass a reasonable test battery and still fail in the real world. (Of course, you need not increment the counter on every single insertion, as I just noted in the Code Collaborator review.)
32bit won't be enough for operations which run for months ... 64bit or 128bit counters should be sufficient for almost any real-world application ...
Another option would be chaining, where each heap node contains either a single item or a linked list of items with the same key value. This would minimize the required changes to existing algorithms, support fancier heap types and heaplike structures, and only impose some storage overhead (one or two pointers into the list, perhaps a length counter, and a means of synchronizing list access (which could be an atomic update protocol for the head/tail pointers and/or length counter)) when you actually need it, at the cost of some type erasure within the generic heap structure and possibly extra copying at runtime (when a singleton node first becomes a linked list.)
chaining would be a reasonable way for node-based implementations, but not for container adaptors ...
On 6/2/2011 at 10:19, Tim Blechmann wrote:
will have a look ... in general boost.heap does not throw any exceptions by itself, but of course the held type, the container or the allocator can throw
Lines 231 & 232 of the review version of stable_heap.cpp are: if (counter_ == std::numeric_limits<std::size_t>::max()) BOOST_THROW_EXCEPTION(std::runtime_error("boost::heap counter overflow"));
ah yes ... sry ... forgot about this ... tim -- tim@klingt.org http://tim.klingt.org You can play a shoestring if you're sincere John Coltrane

how does this approach deal with mutabiliy? if we have multiple keys a, and change one key from a to b and later back to a, we loose any information about the original position. we could simply define this sequence as undefined or interpret `update' as `insert' ...
I think that updating an object gives would have the same meaning as erasing the object and then re-pushing it. The heap shouldn't expect to retain a history of previous values.
Another option would be chaining, where each heap node contains either a single item or a linked list of items with the same key value.
chaining would be a reasonable way for node-based implementations, but not for container adaptors ...
I'm not sure that's entirely true. Obviously, it won't work for any containers of the form container<T> where T is the value type of the heap. If you can rebind it to something like container<node<T>>, you might be able to make it work. Chaining can change the properties of the data structure, however. For example, a binary heap (for example) using array<T, N> does not require an allocator. A stable binary heap (using chaining) would require allocations. For node-based heaps, this is, of course, a non-issue. Andrew
participants (3)
-
Andrew Sutton
-
Brent Spillner
-
Tim Blechmann