
Hi, this is my actual review of Boost.Heap. I took part in this review, because I was interested in the results of Boost.Heap for my old heap test/benchmark (related to the fast marching algorithm). So I decided that trying to use Boost.Heap with the old test code was a good experiment for a review, allowing me to see how helpful the documentation is, how easy it is to use the heaps and how well they perform. - The documentation is easy to read and helpful with respect to the questions it tries to address. The main question it doesn't try to address is how to make effective use of Boost.Heap, i.e. which heap to use for which application and how to use it optimally. - Boost.Heap is easy to use, at least in principle. -- One inconvenience for me was the use of max-heaps instead of min-heaps. Well, one could argue that a priority_queue should return the elements with the highest priority first (but as Wikipedia says: "some conventions consider lower priorities to be higher"), and heap-sort uses a max-heap. Since these are the two application where stl uses a heap, it's clear why stl implements a max-heap. But the applications where it is beneficial to call increase or decrease directly normally use a min-heap. Now I have to call increase even so the key value actually has decreased... -- I was left a bit with the feeling that I hadn't made the most efficient use of Boost.Heap. For example, I quickly found out that calling update just with a handle (as I had implemented it first) was suboptimal for fibonacci_heap. -- The segmentation fault in pairing_heap was unexpected for me. Even more unexpected was Tim's response that it's actually expected. So I tried to inform me a bit. I found statements like "A pairing heaps is a type of heap data structure with relatively simple implementation and excellent practical amortized performance" and "Experimental studies indicate that pairing heaps actually outperform Fibonacci heaps", which are incompatible with pairing_heap segfaulting for a simple testcase (IMHO). My conclusion is that the statement "due to the recursive nature of this data structure, ... beware of stack overflows" is actually misleading. This implementation of pairing_heap actually has a known issue. As long as this issue is not fixed, the structure may be interesting for tests, but "production" use should be strongly discouraged. - The performance numbers for Boost.Heap were neither bad nor outstanding. -- The numbers for b_heap are not convincing -- The numbers for fibonacci_heap seem to be the best. -- The pairing_heap has a smaller compare count than fibonacci_heap, but actually performs worse than fibonacci_heap in this test. Considering statements like "Experimental studies indicate that pairing heaps actually outperform Fibonacci heaps", I wonder whether the implementation of pairing_heap could be improved so that it actually outperforms fibonacci_heap. -- The comparison with TStaticHeap and TSequentialHeap should be taken with a grain of salt. I had first just added fibonacci_heap to the test, which is a node based heap. The test case isn't very favorable for node based heaps. And for the other heaps, I used the same class template, so that .reserve isn't called (it doesn't exist for node based heaps), but the analog function is called for TStaticHeap. And the tuned TSequentialHeap uses "debatable"/"unusual" techniques.
What is your evaluation of the design?
I guess it's OK. What makes me a bit nervous is the statement "The library is not targeted in any specific domain nor directly related to any specific libraries". A good interface often is the result from an abstraction over different applications. The other way (coming from the data structure/algorithm) risks bloating the interface with unnecessary functionality. For example, somebody asked during the review for "iteration over the heap in priority order". Is this really something that many applications requiring a heap will benefit from? I heavily doubt this.
What is your evaluation of the implementation?
I haven't looked too deep into it, but I think it's OK. I'm not sure whether there isn't a bit too much code reuse and "one size fits all". Was it really possible to implement all the heaps in their optimal form with this approach? But I'm not expert enough to really judge this. I was a bit surprised how pairing_heap::merge_children was implemented (was it just a mistake?), but I don't believe that this is caused by the code reuse or the "one size fits all". It's possible that similar problem also exist in other functions, but that doesn't change the good overall impression. It's only natural that such problems are only brought up when the code is used by more people than just the initial author.
What is your evaluation of the documentation?
The documentation is easy to read. It doesn't address the question how to make effective use of Boost.Heap, i.e. which heap to use for which application and how to use it optimally. One benefit of the library would be to educate the user about the different possibilities. An important thing here is worse case performance and amortized performance, why are different heaps needed at all, ...
What is your evaluation of the potential usefulness of the library?
There are different use cases I can see: 1) You already have a heap in your software, but you want to see how good it performs with respect to what else is out there. 2) You need a heap as part of a more complex algorithm. 3) You need a priority queue for scheduling something. The library in its current form is actually really well suited for 1, because all the heaps have a very similar interface, which makes it easy to test many different heaps without much effort. The library is also quite useful for 2, even if it would be nice if the documentation would save me some of the required experimentation. I also fear that the uniform interface of the heaps sacrifices some optimality for some algorithms. - Take the fast marching algorithm as an example. I typically have one bulk with the arrival times and one bulk with the speeds. I'm not especially keen on having another bulk with "heap_t::handle_type", as that may double my memory consumption on a 64-bit system, if arrival times and speeds were of type float. - However, the alternative is giving up the uniform interface of the heaps, just for some very questionable savings for some special applications. Because for Dijkstra's algorithm, the memory consumption is dominated by the edges of the graph, so that I don't care about an additional array with "heap_t::handle_type" of the same size as the number of vertices of the graph. I have the impression that std::priority_queue is already sufficient for 3, so that Boost.Heap wouldn't be required for that use case. Overall, I think it is quite useful.
Did you try to use the library? With what compiler? Did you have any problems?
Yes, I used it. gcc-4.3.4 (cygwin), gcc-4.3.3 (win32), gcc-4.4.3 (ubuntu), gcc-4.5.2 (win32), MSVC-9, MSVC-10. There were some minor problems, but all the problems I actually reported have been fixed (except the unexpected segfault for pairing_heap). So may main problem was probably that I felt a bit left alone with respect to the question which heap to use and how to use it optimally.
How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
More effort than I had actually planed to spend, but I already knew from previous reviews that this would happen. I think I did a reasonably deep study from a user point of view.
Are you knowledgeable about the problem domain?
Yes, at least from the user point of view. I have heard of binomial heaps, d_ary heaps and fibonacci heaps before. I didn't knew about b_heap, pairing_heap and skew_heap, but I knew that there are many different heap variants out there. For example I knew about the sequential heap described in "Fast Priority Queues for Cached Memory" by P. Sanders (which doesn't mean that TSequentialHeap is identical to the sequential heap described in that paper, but it isn't too different either).
Do you think the library should be accepted as a Boost library?
No, the library should not be accepted unless - The current review either magically goes back on track and at least 3 other persons also submit a proper review, or another (mini?) review with proper participation is held later. - Either pairing_heap is fixed, or non-experimental (="production") use is strongly discouraged in the documentation. - Either a convincing example where b_heap outperforms the other heaps is presented, or it's use should be discouraged in the documentation. - The documentation says a bit more about the different heaps and their trade offs (amortized time <-> worst case), and generally tries to give recommendation to the user which heap will likely best fit his application, and which settings he should try (for example something like that arity<4> is a good compromise for a d_ary heap between cache locality and compare count). I used the negative form of "Yes, the library should be accepted under the following conditions", because I'm unhappy with how the review went. The announcement was sent out too late, without stating the dates of the review, much of the initial discussion was about the code collaborator tool, but I haven't seen any "real" review, despite that fact that there seems to be interest in the library. There were some discussions on Code Collaborator, and the tool was actually not bad in pointing to concrete problems in the actual code. Tim quickly addressed these, but a good review would have some discussions and actual opinions and suggestions. It is also unclear to me how well Boost.Heap performs with respect to boost/graph/detail/d_ary_heap.hpp. I could have tried to also include this heap in my heap test/benchmark, but as the review seemed to have silently died already, I wasn't motivated enough for doing this. Regards, Thomas

hi thomas, thanks for your formal review!
-- One inconvenience for me was the use of max-heaps instead of min-heaps. Well, one could argue that a priority_queue should return the elements with the highest priority first (but as Wikipedia says: "some conventions consider lower priorities to be higher"), and heap-sort uses a max-heap. Since these are the two application where stl uses a heap, it's clear why stl implements a max-heap. But the applications where it is beneficial to call increase or decrease directly normally use a min-heap. Now I have to call increase even so the key value actually has decreased...
there are two conventions to consider: min-heaps (which are typically used in textbooks) and max-heaps (which are used in the stl heap functions). i prefered to be consistent with the stl. likewise i am using the `top()' to extract the value with the highest priority, not `find_min()' like it is often used in the literature ...
-- I was left a bit with the feeling that I hadn't made the most efficient use of Boost.Heap.
the only way to find the best data structure is to test with a specific workload. boost.heap does not provide a `one-size-fits-all' solution, but a toolbox of different configurable data structures.
For example, I quickly found out that calling update just with a handle (as I had implemented it first) was suboptimal for fibonacci_heap.
suboptimal in what way?
- The performance numbers for Boost.Heap were neither bad nor outstanding. -- The numbers for b_heap are not convincing -- The numbers for fibonacci_heap seem to be the best. -- The pairing_heap has a smaller compare count than fibonacci_heap, but actually performs worse than fibonacci_heap in this test. Considering statements like "Experimental studies indicate that pairing heaps actually outperform Fibonacci heaps", I wonder whether the implementation of pairing_heap could be improved so that it actually outperforms fibonacci_heap.
for a performance comparison, one will have to consider * expenses of comparison * expenses of swapping data * size of the managed data * workload (mixture/order of API calls) i suppose for every data structure one can come up with a synthetic benchmark, so that this data structures outperforms every other implementation.
What is your evaluation of the design?
I guess it's OK. What makes me a bit nervous is the statement "The library is not targeted in any specific domain nor directly related to any specific libraries". A good interface often is the result from an abstraction over different applications. The other way (coming from the data structure/algorithm) risks bloating the interface with unnecessary functionality. For example, somebody asked during the review for "iteration over the heap in priority order". Is this really something that many applications requiring a heap will benefit from? I heavily doubt this.
testing two heaps for equivalence will benefit from it. of course you can ask, why do you want to test that ...
Do you think the library should be accepted as a Boost library?
No, the library should not be accepted unless - The current review either magically goes back on track and at least 3 other persons also submit a proper review, or another (mini?) review with proper participation is held later.
... i am a bit disappointed by the lack of reviews myself. given the fact that this is the only one that actually contains a vote and the review stoped yesterday i would appreciate other feedback
- Either a convincing example where b_heap outperforms the other heaps is presented, or it's use should be discouraged in the documentation.
http://article.gmane.org/gmane.comp.lib.boost.devel/220245
- The documentation says a bit more about the different heaps and their trade offs (amortized time <-> worst case), and generally tries to give recommendation to the user which heap will likely best fit his application, and which settings he should try (for example something like that arity<4> is a good compromise for a d_ary heap between cache locality and compare count).
a section of `best practices' may be helpful?
I used the negative form of "Yes, the library should be accepted under the following conditions", because I'm unhappy with how the review went. The announcement was sent out too late, without stating the dates of the review, much of the initial discussion was about the code collaborator tool, but I haven't seen any "real" review, despite that fact that there seems to be interest in the library. There were some discussions on Code Collaborator, and the tool was actually not bad in pointing to concrete problems in the actual code. Tim quickly addressed these, but a good review would have some discussions and actual opinions and suggestions.
well, i would appreciate more reviews myself ... however if you are the only person who includes a vote and this vote is no, boost.heap should be rejected. no matter if the reason is `documentation', `code', `performance' or `review process'.
It is also unclear to me how well Boost.Heap performs with respect to boost/graph/detail/d_ary_heap.hpp. I could have tried to also include this heap in my heap test/benchmark, but as the review seemed to have silently died already, I wasn't motivated enough for doing this.
iirc bgl heap data structures are pretty specific (optimized for integers). thanks for taking your time to review boost.heap. tim

My review also... - What is your evaluation of the design? The design of the library included a number of features that did not seem to be adequately motivated by requirements, especially the ability to iterate over the elements of a heap. It's not obvious that any applications require or would use this feature, nor what specific requirements that they may have of iteration (heap-order vs. tree-order iteration. I feel that the design (of individual data structures) focuses too heavily on policy-based design. I think that there are a number of parameters that should be "positional parameters" rather than optionally or arbitrarily specified, especially the Compare functions since they define a fundamental property of the heaps. Also, the b_heap and d_ary heap should also be parameterized over (dynamic random access?). The concept of Mergeable heaps (those supporting efficient merge operations), at the time of submission, was inadequately addressed. - What is your evaluation of the implementation? The implementation is adequate and appears correct (based on inspection and performance comparisons). I think that the definition of heap equality and order may be strict to the point of inefficient. I think it should still be sufficient to define these operations in terms of object structure rather than the sequence of values produced. - What is your evaluation of the documentation? I think that the documentation could be substantially improved. It should start with an introduction of a basic Heap concept and describe the operations push, pop, top, size, and empty. This should be followed by a table of implementations and their runtime complexity for different operations. I feel that the documentation needs to explicitly present the concepts (as in heap_concepts.hpp) defined in the library. Also, it would be a good idea to tie the documentation into previously specified concepts in boost/concept_check.hpp. That would simplify the description of heaps as EqualityComparable and LessThanComparable, and also Container (for iterators, size and empty). The discussion of Mutability needs to be greatly enhanced. It isn't sufficient to simply show the interfaces for acquiring and updating handles. A Mutable Heap is really part of a larger data structure that incorporates a mapping between heaped objects and their handles. You can't really talk about real applications of Mutable Heaps without this mapping. This should be given much more attention in the description. I also would have liked to have seen a little more historical perspective in the introduction. Why was the library written? What libraries motivated the current design? - What is your evaluation of the potential usefulness of the library? As a collection of basic data structures, the library is potentially very useful. Not all data structures in the library are equally useful (binomial heap is largely obsolete), but they are still useful to others understand their design and implementation. - Did you try to use the library? With what compiler? Did you have any problems? Earlier versions of the library with GCC 4.5. I did not have any problems that I recall. I did not try to build a performance comparison as other reviewers did. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I spent a good amount of time inspecting the source code and documentation. - Are you knowledgeable about the problem domain? Yes. - Do you think the library should be accepted as a Boost library? Not in its current state. I feel that the library is over-designed and needs to be simplified. The problem is not that the author did not produce a viable design and implementation of a heap library. I think he did just that, and with some cleanups to the documentation and minor reworking of some implementation, this is a fine library. The problem is that library is comprised of a collection of fundamental data structures, some of which have been known for 30 or 40 years (if not more). This fact makes acceptance of such a library much harder than a lesser known or new domain. We all know about data structures and algorithms (and know them pretty well), and we all know what these data structures and how they work (or can figure it out). An ideal library of these data structures would perfectly match our expectations, whatever those may be. Any deviation from the ideal would be questionable, and that seems to be how the review played out. Conceptually, the notions of Mergeability and Mutability are not as crisply defined as I would have hoped. While it's possible to merge all heaps, only some are efficiently (i.e., O(lg n)) mergeable. I think that the approach to Mutability is well-conceived, but not particularly well communicated. It took me some time before I figured out its usage and implications. I'm still not convinced that Stability is an essential requirement, or that the library's approach to the solution is adequate. I would have preferred the library not attempt provide such a feature. The same could be said for iterators. Another issue is the absence of comparison with the BGL heaps. What benefits do we get from using your library? What benefits might we get from the BGL heaps? What are the real performance comparisons? Despite the fact that I am voting for not including the library, I see a great deal of value in its continued development and refinement and would encourage Tim to continue working towards an accepted Heap library for Boost. Andrew

hi andrew,
- What is your evaluation of the design?
The design of the library included a number of features that did not seem to be adequately motivated by requirements, especially the ability to iterate over the elements of a heap. It's not obvious that any applications require or would use this feature,
well, it use use the priority queue inside a scheduler, you may want to modify the payload ...
nor what specific requirements that they may have of iteration (heap-order vs. tree-order iteration.
the current iterators do not guarantee any order (this is specified in the documentation). i am currently working on ordered iterators.
I feel that the design (of individual data structures) focuses too heavily on policy-based design. I think that there are a number of parameters that should be "positional parameters" rather than optionally or arbitrarily specified, especially the Compare functions since they define a fundamental property of the heaps.
i do not agree with this: compare functions are optional, since std::less is used by default. this makes the code more compact: some_heap<T, some_policy<> > vs some_heap<T, std::less<T>, some_policy<> > in the second version, std::less<T> has to be stated explicitly, although the default is used. however i agree, that the arity of the d_ary_heap could be a positional parameter, as it is mandatory.
Also, the b_heap and d_ary heap should also be parameterized over (dynamic random access?).
taking a container instead of an allocator will be straight-forward for non- stable d-ary and b-heaps. for stable heaps a type-correct version needs to be specified. for the mutable versions i am not sure whether it is feasible, since it will require some knowledge about the internal representation. maybe a metafunction will do it, but it is not going to be beautiful ... (stl containers cannot be rebound like allocators, can they?)
- What is your evaluation of the implementation?
The implementation is adequate and appears correct (based on inspection and performance comparisons).
I think that the definition of heap equality and order may be strict to the point of inefficient. I think it should still be sufficient to define these operations in terms of object structure rather than the sequence of values produced.
i find a (structual) equality pretty useless compared to an equivalence. but maybe i should simply remove all direct heap comparisons and replace them with templated equivalence and lexical comparison functions ...
- What is your evaluation of the documentation?
... will try to improve ...
- What is your evaluation of the potential usefulness of the library?
As a collection of basic data structures, the library is potentially very useful. Not all data structures in the library are equally useful (binomial heap is largely obsolete), but they are still useful to others understand their design and implementation.
the binomial heap is not necessarily obsolete: it is the only mergable data structure, provides all heap operations in O(log(N)) worst case. i know, many people do not care about the worst case performance, but for those people who care the binomial heap should be useful.
Conceptually, the notions of Mergeability and Mutability are not as crisply defined as I would have hoped. While it's possible to merge all heaps, only some are efficiently (i.e., O(lg n)) mergeable. I think that the approach to Mutability is well-conceived, but not particularly well communicated. It took me some time before I figured out its usage and implications.
the last idea was to provide generic free functions with a specialization for `efficiently mergable' heaps. however i still do not like the idea of introducing a MergablePriorityQueue concept, because it would basically be an EfficientlyMergablePriorityQueue concept.
I'm still not convinced that Stability is an essential requirement, or that the library's approach to the solution is adequate. I would have preferred the library not attempt provide such a feature. The same could be said for iterators.
depending on the use case it is definitely a requirement ... i have seem many workarounds in real-world software, where the stability of non-stable priority queues need some ugly workarounds ...
Another issue is the absence of comparison with the BGL heaps. What benefits do we get from using your library? What benefits might we get from the BGL heaps? What are the real performance comparisons?
well, the bgl heaps are not part of boost's public API: so the first benefit of boost.heap over bgl's heaps is that it actually has a documentation and that the API will be stable ... nevertheless it heavily relies on bgl's propertymap, making the interface not very intuitive. some of the bgl heaps do not have allocator support, which makes it impossible to use them when a custom allocator has to be used ... thanks a lot for your review and for managing the review in the first place! tim

On 13/06/11 17:54, Tim Blechmann wrote: <snip>
I'm still not convinced that Stability is an essential requirement, or that the library's approach to the solution is adequate. I would have preferred the library not attempt provide such a feature. The same could be said for iterators.
depending on the use case it is definitely a requirement ... i have seem many workarounds in real-world software, where the stability of non-stable priority queues need some ugly workarounds ...
To weigh in briefly, I have needed a stable priority queue on at least one occasion, and achieved it through a rather tiresome and convoluted method involving a heap of linked lists indexed by a hash table. I would value this being moved into a library that has thought more carefully about the best way to solve it. John Bytheway
participants (4)
-
Andrew Sutton
-
John Bytheway
-
Thomas Klimpel
-
Tim Blechmann