
hi phil,
http://chezphil.org/tmp/bench.tgz
This is a "work queue", i.e. it's a queue of boost::function<void(void)> "work items" with float priorities. It's similar to a real application with fast (burst) fill and gradual drain; the pattern here is 10 pushes and 1 pop, repeated until all the work has been added, and then all of the work is popped in sequence. The priorities are essentially random, and there are 100 000 items in total.
thanks for your benchmark ... the pattern is different from the ones that i have tested.
I was unable to try the proposed d_ary_heap because I couldn't find a working syntax for the arity template parameter - any ideas anyone?
:) the scope operator uses two colons: boost::heap::d_ary_heap< value_t, boost:heap::arity<4>, // This doesn't work boost::heap::compare<compare_priorities> > boost::heap::d_ary_heap< value_t, boost::heap::arity<4>, boost::heap::compare<compare_priorities> >
I was expecting the b_heap to be faster, but it is not; it is substantially slower. What is going on there? Does it all come down to size of data, cache size, etc? Ideally, an "improved" implementation like this would be able to fall back to the naive implementation in cases where it does not function any better. Or maybe I'm using it wrongly.
well, benchmarking this can be tricky ... first, your work is boost::funtion<> (iirc 3 pointers) and a float ... so 16 bytes on 32bit or 28 bytes on 64bit ... on 32bit systems you will have a better matching with actual physical cache lines ... second ... synthetical benchmarks are not necessarily the best way to test cache-aware data structures: when stressing the heap, all its content will be located in the cpu caches ... so you won't necessarily see the benefit of the cache-awareness, since the data of the non-cache-aware data structure are still in the CPU cache. unless you manually add some work to thrash the caches ... in your case, you will have a maximum of 90000 items in the queue, on 32bit platforms this is rougly 1.4mb ... you won't see a big effect, if this fits all into the caches ... not sure, if there is a good way to artificially thrash the caches in order to observe the effect in a microbenchmark ...
My understanding is that the fibonacci, binomial and other heaps only improve things when other operations e.g. merging are needed - is that correct?
more or less ... unlike the container adaptors, the fibonacci heap is the only implementation, which provides an O(1) insert and increase ... cheers, tim