Re: [boost] [heap][formal review]

13 Jun 2011

      hi thomas,

thanks for your formal review!
...
-- One inconvenience for me was the use of max-heaps instead of min-heaps.
Well, one could argue that a priority_queue should return the elements
with the highest priority first (but as Wikipedia says: "some conventions
consider lower priorities to be higher"), and heap-sort uses a max-heap.
Since these are the two application where stl uses a heap, it's clear why
stl implements a max-heap. But the applications where it is beneficial to
call increase or decrease directly normally use a min-heap. Now I have to
call increase even so the key value actually has decreased...
there are two conventions to consider: min-heaps (which are typically used in 
textbooks) and max-heaps (which are used in the stl heap functions). i prefered 
to be consistent with the stl.
likewise i am using the `top()' to extract the value with the highest priority, 
not `find_min()' like it is often used in the literature ...
...
-- I was
left a bit with the feeling that I hadn't made the most efficient use of
Boost.Heap.
the only way to find the best data structure is to test with a specific 
workload. boost.heap does not provide a `one-size-fits-all' solution, but a 
toolbox of different configurable data structures.
...
For example, I quickly found out that calling update just with
a handle (as I had implemented it first) was suboptimal for
fibonacci_heap.
suboptimal in what way?
...
- The performance numbers for Boost.Heap were neither bad nor outstanding.
-- The numbers for b_heap are not convincing
-- The numbers for fibonacci_heap seem to be the best.
-- The pairing_heap has a smaller compare count than fibonacci_heap, but
actually performs worse than fibonacci_heap in this test. Considering
statements like "Experimental studies indicate that pairing heaps actually
outperform Fibonacci heaps", I wonder whether the implementation of
pairing_heap could be improved so that it actually outperforms
fibonacci_heap.
for a performance comparison, one will have to consider
* expenses of comparison
* expenses of swapping data
* size of the managed data
* workload (mixture/order of API calls)

i suppose for every data structure one can come up with a synthetic benchmark, 
so that this data structures outperforms every other implementation.
...
...
What is your evaluation of the design?
I guess it's OK. What makes me a bit nervous is the statement "The library
is not targeted in any specific domain nor directly related to any
specific libraries". A good interface often is the result from an
abstraction over different applications. The other way (coming from the
data structure/algorithm) risks bloating the interface with unnecessary
functionality. For example, somebody asked during the review for
"iteration over the heap in priority order". Is this really something that
many applications requiring a heap will benefit from? I heavily doubt
this.
testing two heaps for equivalence will benefit from it. of course you can ask, 
why do you want to test that ...
...
...
Do you think the library should be accepted as a Boost library?
No, the library should not be accepted unless
- The current review either magically goes back on track and at least 3
other persons also submit a proper review, or another (mini?) review with
proper participation is held later.
... i am a bit disappointed by the lack of reviews myself. given the fact that 
this is the only one that actually contains a vote and the review stoped 
yesterday i would appreciate other feedback
...
- Either a convincing example where b_heap outperforms the
other heaps is presented, or it's use should be discouraged in the
documentation.
http://article.gmane.org/gmane.comp.lib.boost.devel/220245
...
- The documentation says a bit more about the different
heaps and their trade offs (amortized time <-> worst case), and generally
tries to give recommendation to the user which heap will likely best fit
his application, and which settings he should try (for example something
like that arity<4> is a good compromise for a d_ary heap between cache
locality and compare count).
a section of `best practices' may be helpful?
...
I used the negative form of "Yes, the library should be accepted under the
following conditions", because I'm unhappy with how the review went. The
announcement was sent out too late, without stating the dates of the
review, much of the initial discussion was about the code collaborator
tool, but I haven't seen any "real" review, despite that fact that there
seems to be interest in the library. There were some discussions on Code
Collaborator, and the tool was actually not bad in pointing to concrete
problems in the actual code. Tim quickly addressed these, but a good
review would have some discussions and actual opinions and suggestions.
well, i would appreciate more reviews myself ... however if you are the only 
person who includes a vote and this vote is no, boost.heap should be rejected. 
no matter if the reason is `documentation', `code', `performance' or `review 
process'.
...
It
is also unclear to me how well Boost.Heap performs with respect to
boost/graph/detail/d_ary_heap.hpp. I could have tried to also include this
heap in my heap test/benchmark, but as the review seemed to have silently
died already, I wasn't motivated enough for doing this.
iirc bgl heap data structures are pretty specific (optimized for integers).

thanks for taking your time to review boost.heap.

tim

Re: [boost] [heap][formal review]

Tim Blechmann