[review] [heap] Summary

The formal review of Tim Blechmann's Heap library has concluded. Only four votes were cast, although more people commented on issues either on the mailing list or though the Code Collaborator tool. However, from the comments received, the review identified no fundamental problems in the design or implementation of the library that would entail rejection. A such, the Boost.Heap library is ACCEPTED into Boost Votes for the library were cast by: Phil Endecott Daniel Russel Votes against: Thomas Klimpel (provisionally) Andrew Sutton Other comments received during the course of the review were generally positive, with a number of technical issues identified that need to to be addressed. The final version of the library can be fully accepted once Tim has addressed the technical issues identified during the review. Below is a summary of technical concerns. - The pairing heap has a known issue, regarding stack overflows, making it potentially unusable for production code. If the recursion issue can't be adequately solved, then the data structure should carry a much stronger warning label (i.e., put it in an experimental directory). - The performance of b_heap vs. d_ary_heap vs. priority_queue needs to be better characterized. Under what circumstances does one outperform the other. If no convincing scenario can be found where b_heap outperforms the others, then it should also be placed in an experimental directory. - The heap concepts need to be revisited as per comments on the review tool. The concepts should also describe semantics of the operations, not just the interfaces. - There were several comments on the implementation of heap stability. Three alternatives were identified: “naturally stable” heaps (of which no examples were given), the current counter based approach, and a chaining approach. No consensus on the best implementation strategy. Although the counter-based approach is probably adequate for most applications, it is not universally desirable. A more thorough investigation of alternative strategies is encouraged. It would be acceptable to document the investigation in a “Future Work” section of the library documentations (with any experiments also housed in the library). - The notion of Mergeable heaps needs to be better addressed, at least to differentiate between heaps that merge in O(n) or slower time and those that can merge in O(lg n) time. The proposed solution is to write merge as a general heap algorithm and specialize it for Mergeable heaps by calling the member function merge. Also, merge should be destructive, so merge and clear should not be needed. - The documentation must be substantially improved with respect to the overall design of heap data structures (i.e., concepts), especially mutable heaps and best practices. It also needs to provide both theoretical and performance comparisons for the different heaps. - Include documentation of the benchmarking tool in the documentation. It could be instructive for users to evaluate different heap implementations on their host or target architectures. - Document the complexity of == and < for heaps. - Address the comments and issues in the Review Tool. There were other recommendations that should be considered in the long term for the library. - Thorsten Ottosen suggest making b_heap and d_ary_heap container adaptors along the lines of std::stack, std::queue, and std::priority_queue, allowing parameterization over (e.g., stack-based container). Subsequent discussion indicates that there are some impracticalities in doing so, namely that it could invalidate other policies (e.g., stability). - Phil Endecott suggested rewriting some heaps (b_heap and d_ary_heap) in terms of more general algorithms operating on a (Random Access) Range in order to support the formulation of heap structures on other memory structures. Doing this would also help support the definition of heaps that satisfied Thorsten's request. I strongly recommend investigating Phil's suggestion, but this is not a barrier to acceptance. This feature could be added in future revisions of the library. I'd like to thank the community members who took the time to review Tim's work and especially those that submitted formal reviews. I'd also like to thank Tim (again) for his contributions and patiently answering our questions and addressing our comments. Andrew

hi andrew and list,
i will try to address the issues in the next few weeks and post an updated version when it is ready ...
i have been thinking about a `best practices' section, discussing the performance aspects in more detail.
ok
in order to implement stability without a version count, we need to be able to find out, if a value is already inside the heap. since a heap is not a search tree the possible solutions are (a) iterate the complete heap (O(n)), (b) use a treap (mixture of heap and binary search tree, O(log(n))) or (c) maintain a separate hash table O(1). then nodes can be ordered in `natural stable' order or they could be chained. even compared with a hash table, the version count will probably be more time- efficient (no hashing required) and most likely more memory efficient ...
more or less done
already started to work on this
will give some more details about my benchmarking procedure. but like all synthetic benchmarks, they should be taken with a grain of salt ...
- Document the complexity of == and < for heaps.
ok
- Address the comments and issues in the Review Tool.
ok
i will have a look when i addressed the other issues. i would like to thank the reviewers for their feedback and the valuable discussions about the boost.heap library, andrew for running the review and google for supporting this project as part of the summer of code program. cheers, tim

Den 18-06-2011 16:49, Andrew Sutton skrev:
This is of course not a very large foundation on which to make such a decision. Nor is the vote particular clear. I didn't cast a vote, but I will be /extremely/ unhappy with having a library where this is not addressed:
I fail to to see the problem; I don't recall any problem with stability or other problem. If there is one, then make the guarantees conditional on the properties of the supplied container. If drop-in support for other types of containers is not added, the library is a lot less useful for me (and many others). There is plenty of motivation for having this feature, and it is dead-easy to implement as well. So /please/ add this to the list of required fixes that must be done before inclusion is possible. -Thorsten

Like I said in the summary, comments received during the review did not indicate any fundamental problems with the design or implementation of the library, and many people seemed to be satisfied with it. Although there weren't a lot of formal votes, I felt that there was enough discussion for me to make an informed decision. My vote was based on a difference in design philosophy, but in the end, I didn't consider that to be a sufficient reason to reject the library. The lack of participation, especially when it came to voting, was discouraging.
No, I will not add this to the list of requirements. I considered this suggestion very carefully when writing my evaluation and again when preparing the summary and ultimately decided that trying to shoehorn this requirement into the existing framework was not the best approach for providing the feature. Phil's suggestion of implementing heap algorithms on ranges lends itself to a more natural solution to this requirement. I strongly encourage Tim to consider this strategy in future revisions of the library. Andrew

hi thorsten,
concerning make_heap functions: using the b_heap as container adaptor is pretty fragile, since it requires padding of the first element. this cannot really be done in the std::make_heap family as this padding element should be invisible to the user. with free functions stability: in order to implement stability, we need to either maintain a state for the version count or to find a way to lookup elements by a key. maintaining a state will require extra memory which cannot be stored in the container. looking up the key of an inserted element will be inefficient (O(n) because one cannot efficiently find elements) or requires an indirection via a dictionary (and therefore cannot be implemented as container adaptor). mutability: mutability for container adaptors requires an indirection, which cannot be achieved via container adaptors. ... for immutable and non-stable heaps, the std::make_heap family will perform pretty well (as they are usually d-ary heaps with an arity of 4) ... about a container<> policy argument to the heap: * one template parameter conflicting with another is not really clean (although it could be caught via static assertions) * containers cannot be rebound (unlike allocators). so one will have to specify the exact type, that is used internally ... which is not really a clean interface, either * for mutable heaps, the allocator template parameter will be used for both the heap container and for the list of values. if we specify the container, the allocator argument will have to be used for the value list, so we cannot statically assert the parameter conflict. * for mutable heaps, the container argument will have to be specified with the exact type of the internal handles, which would have to be exposed directly. this would be WAY easier, if we had a standardized way to rebind containers. But since this is not the case, the interface would be pretty ugly. my concern is not really the implementation ... it surely is dead-easy ... but the consistency of the interface is not necessarily trivial ... tim

make_heap/push_heap/pop_heap [1,2,3]
for immutable heaps, this could be statically asserted ... still it somehow smells ...
to implement mutability, you need to efficiently find the elements in the heap. so the actual heap is not a heap of value_type of pair<value_type, stability_count_type>, but of iterators to a std list, which contains a pair<internal_value, index_in_heap>. the allocator argument is therefore used twice: once for the std::vector storing the heap-structed tree and once for the list containing the value and index to the heap.
maybe it is not reasonable for mutable heaps ... in general, i prefer robustness and consistency over features. and i do not really see a reason to duplicate functionality from the stl ... if you really need to specify the container directly, you loose mutability and the nice interface for stability (you'd have to implement the version counting manually, but that is dead-easy). the only features that you will miss are ordered iterators and comparison/equivalence checks, however only few applications will actually use those features... tim [1] http://www.cplusplus.com/reference/algorithm/make_heap/ [2] http://www.cplusplus.com/reference/algorithm/push_heap/ [3] http://www.cplusplus.com/reference/algorithm/pop_heap/
participants (3)
-
Andrew Sutton
-
Thorsten Ottosen
-
Tim Blechmann