
it would be great if the b_heap implementation performed no worse than the "naive" version.
it cannot: it uses a different tree structure than binary heaps, causing a higher tree. with d-ary heaps, the arity is a tradeoff between tree height and number of comparisons (higher arity, lower tree height but more comparisons). in a similar manner, the b-heap adds a tradeoff between memory locality and tree height ...
Also I've looked at your own benchmark graphs and it seems that there is a fairly constant performance difference between the different heaps - I don't see b_heaps getting faster than the alternatives as the queue size increases.
maybe because my workstation has 8mb of cache
Tim, you're proposing this b-heap purely as a performance optimisation compared to the "traditional" heap, but we've not yet seen any cases where it is actually faster. If it can be faster we need to see some examples, and the documentation should qualify when it can be expected to be better. If it's not faster, it should be dropped.
just for the record: core i7 920, x86_64, linux-3.0-rc2, hyperthreading/frequency scaling disabled, 12gb triple-channel ddr3 ram stressed with: stress -m 1 --vm-bytes 4G --io 16 -c 3 and two boinc processes running at nice 10 configured the problem size to 15000000 changed work_t from boost::function to int (24byte to 4byte) comparison of b-heap and dary_heap (with arity 2) ... running all at nice 1 to increase the number of context switches. Performance counter stats for './bench_boostbheap': 532245.601009 task-clock # 0.865 CPUs utilized 150,066 context-switches # 0.000 M/sec 3,497 CPU-migrations # 0.000 M/sec 1,333,350 page-faults # 0.003 M/sec 1,421,230,219,856 cycles # 2.670 GHz 928,444,569,746 stalled-cycles-frontend # 65.33% frontend cycles idle 643,217,557,503 stalled-cycles-backend # 45.26% backend cycles idle 1,517,920,671,665 instructions # 1.07 insns per cycle # 0.61 stalled cycles per insn 328,233,864,211 branches # 616.696 M/sec 2,800,247,538 branch-misses # 0.85% of all branches 615.223322417 seconds time elapsed Performance counter stats for './bench_boostdary': 575427.533478 task-clock # 0.873 CPUs utilized 160,863 context-switches # 0.000 M/sec 3,841 CPU-migrations # 0.000 M/sec 1,333,346 page-faults # 0.002 M/sec 1,536,548,826,172 cycles # 2.670 GHz 1,304,937,894,945 stalled-cycles-frontend # 84.93% frontend cycles idle 999,244,398,224 stalled-cycles-backend # 65.03% backend cycles idle 747,525,733,877 instructions # 0.49 insns per cycle # 1.75 stalled cycles per insn 145,188,482,482 branches # 252.314 M/sec 1,661,903,337 branch-misses # 1.14% of all branches 659.415916004 seconds time elapsed in a second run, i am comparing the cache misses: Performance counter stats for './bench_boostbheap': 11,989,045,556 L1-dcache-load-misses [33.34%] 96,664,490 L1-dcache-store-misses [50.00%] 8,082,318,794 L1-dcache-prefetch-misses [33.32%] 5,457,166,960 LLC-load-misses [49.99%] 86,358,587 LLC-store-misses [16.67%] 72,903,925 LLC-prefetch-misses [16.66%] 602.784184115 seconds time elapsed Performance counter stats for './bench_boostdary': 14,139,852,204 L1-dcache-load-misses [33.33%] 96,895,924 L1-dcache-store-misses [50.00%] 752,673,940 L1-dcache-prefetch-misses [33.33%] 6,081,358,271 LLC-load-misses [49.99%] 85,684,373 LLC-store-misses [16.67%] 106,694,619 LLC-prefetch-misses [16.67%] 652.591757245 seconds time elapsed ... now i tweaked the benchmark: * using 10 heaps instead of one but only run 5 times instead of 50 - increases cache thrashing * remove work so that the heaps only contains floats (4byte) - more objects per cachline * 4 processes on the idle quad-core machine (2 for default stats, 4 for cache misses events) - increases cache thrashing results: Performance counter stats for './bench_boostbheap': 651624.952248 task-clock # 0.780 CPUs utilized 138,831 context-switches # 0.000 M/sec 2,543 CPU-migrations # 0.000 M/sec 674,172 page-faults # 0.001 M/sec 1,740,597,175,799 cycles # 2.671 GHz 1,292,857,574,357 stalled-cycles-frontend # 74.28% frontend cycles idle 831,566,179,360 stalled-cycles-backend # 47.77% backend cycles idle 1,441,574,215,357 instructions # 0.83 insns per cycle # 0.90 stalled cycles per insn 328,096,551,218 branches # 503.505 M/sec 1,152,407,287 branch-misses # 0.35% of all branches 834.898435350 seconds time elapsed Performance counter stats for './bench_boostdary': 759395.182963 task-clock # 0.815 CPUs utilized 156,451 context-switches # 0.000 M/sec 2,781 CPU-migrations # 0.000 M/sec 674,170 page-faults # 0.001 M/sec 2,028,502,096,477 cycles # 2.671 GHz 1,806,658,397,323 stalled-cycles-frontend # 89.06% frontend cycles idle 988,635,679,553 stalled-cycles-backend # 48.74% backend cycles idle 698,231,661,607 instructions # 0.34 insns per cycle # 2.59 stalled cycles per insn 145,877,024,263 branches # 192.096 M/sec 907,579,344 branch-misses # 0.62% of all branches 931.525434663 seconds time elapsed Performance counter stats for './bench_boostbheap': 23,480,288,668 L1-dcache-load-misses [33.34%] 264,061,426 L1-dcache-store-misses [50.00%] 14,720,355,800 L1-dcache-prefetch-misses [33.32%] 7,914,697,865 LLC-load-misses [49.99%] 46,737,807 LLC-store-misses [16.67%] 25,875,481 LLC-prefetch-misses [16.67%] 832.307436282 seconds time elapsed Performance counter stats for './bench_boostdary': 27,020,681,775 L1-dcache-load-misses [33.33%] 326,151,939 L1-dcache-store-misses [50.00%] 3,795,316,756 L1-dcache-prefetch-misses [33.33%] 10,296,069,595 LLC-load-misses [50.00%] 45,933,195 LLC-store-misses [16.67%] 43,923,748 LLC-prefetch-misses [16.67%] 949.094734119 seconds time elapsed :) tim