
Tim Blechmann wrote:
I was unable to try the proposed d_ary_heap because I couldn't find a working syntax for the arity template parameter - any ideas anyone?
:) the scope operator uses two colons:
D'oh!
I was expecting the b_heap to be faster, but it is not; it is substantially slower. What is going on there? Does it all come down to size of data, cache size, etc? Ideally, an "improved" implementation like this would be able to fall back to the naive implementation in cases where it does not function any better. Or maybe I'm using it wrongly.
well, benchmarking this can be tricky ... first, your work is boost::funtion<> (iirc 3 pointers) and a float ... so 16 bytes on 32bit or 28 bytes on 64bit ... on 32bit systems you will have a better matching with actual physical cache lines ... second ... synthetical benchmarks are not necessarily the best way to test cache-aware data structures: when stressing the heap, all its content will be located in the cpu caches ... so you won't necessarily see the benefit of the cache-awareness, since the data of the non-cache-aware data structure are still in the CPU cache. unless you manually add some work to thrash the caches ... in your case, you will have a maximum of 90000 items in the queue, on 32bit platforms this is rougly 1.4mb ... you won't see a big effect, if this fits all into the caches ...
The data cache sizes on the test systems are I think: 1. AMD Athlon II X2: 64 kB L1 + 1 MB L2. 2. VIA C7-M: 32 kB L1 + 128 kB L2. 3. ARM v5 (Marvell Kirkwood): 16 kB L1 + 256 kB L2. So this data set really shouldn't fit in the cache. Even if it did, it would be great if the b_heap implementation performed no worse than the "naive" version. Also I've looked at your own benchmark graphs and it seems that there is a fairly constant performance difference between the different heaps - I don't see b_heaps getting faster than the alternatives as the queue size increases. (BTW, I find it almost impossible to distinguish the colours in those graphs...) Regards, Phil.