
Tim Blechmann wrote:
Tim, you're proposing this b-heap purely as a performance optimisation compared to the "traditional" heap, but we've not yet seen any cases where it is actually faster. If it can be faster we need to see some examples, and the documentation should qualify when it can be expected to be better. If it's not faster, it should be dropped.
just for the record: core i7 920, x86_64, linux-3.0-rc2, hyperthreading/frequency scaling disabled, 12gb triple-channel ddr3 ram
stressed with: stress -m 1 --vm-bytes 4G --io 16 -c 3 and two boinc processes running at nice 10
configured the problem size to 15000000 changed work_t from boost::function to int (24byte to 4byte)
comparison of b-heap and dary_heap (with arity 2) ... running all at nice 1 to increase the number of context switches.
Performance counter stats for './bench_boostbheap': [snip] 615.223322417 seconds time elapsed
Performance counter stats for './bench_boostdary': [snip] 659.415916004 seconds time elapsed
That's great. But I was comparing with your priority_queue<> and the std::push|pop_heap, not your d_ary_heap<>. Could you do the same test with one of those?
Performance counter stats for './bench_boostbheap': 1,421,230,219,856 cycles 928,444,569,746 stalled-cycles-frontend 643,217,557,503 stalled-cycles-backend 1,517,920,671,665 instructions
Performance counter stats for './bench_boostdary': 1,536,548,826,172 cycles 1,304,937,894,945 stalled-cycles-frontend 999,244,398,224 stalled-cycles-backend 747,525,733,877 instructions
That's the interesting part: the b-heap takes twice as many instructions as the d-array heap, so the improvement in cache locality needs to be substantial enough to double the instruction throughput before the b-heap does better. You've had to pull all the stops out to make that happen. Regards, Phil.