Re: [boost] [Heaps] benchmarks

9 Jun 2011

      ...
...
...
it
would be great if the b_heap implementation performed no worse than the
"naive" version.
it cannot: it uses a different tree structure than binary heaps, causing
a higher tree. with d-ary heaps, the arity is a tradeoff between tree
height and number of comparisons (higher arity, lower tree height but
more comparisons). in a similar manner, the b-heap adds a tradeoff
between memory locality and tree height ...
...
Also I've looked at your own benchmark graphs and it seems that there
is a fairly constant performance difference between the different heaps
- I don't see b_heaps getting faster than the alternatives as the queue
size increases.
maybe because my workstation has 8mb of cache
Tim, you're proposing this b-heap purely as a performance optimisation
compared to the "traditional" heap, but we've not yet seen any cases
where it is actually faster.  If it can be faster we need to see some
examples, and the documentation should qualify when it can be expected
to be better.  If it's not faster, it should be dropped.
just for the record:
core i7 920, x86_64, linux-3.0-rc2, hyperthreading/frequency scaling disabled,
12gb triple-channel ddr3 ram

stressed with:
stress -m 1 --vm-bytes 4G --io 16 -c 3
and two boinc processes running at nice 10

configured the problem size to 15000000
changed work_t from boost::function to int (24byte to 4byte)

comparison of b-heap and dary_heap (with arity 2) ... running all at nice 1 to
increase the number of context switches.

 Performance counter stats for './bench_boostbheap':

     532245.601009 task-clock                #    0.865 CPUs utilized          
           150,066 context-switches          #    0.000 M/sec                  
             3,497 CPU-migrations            #    0.000 M/sec                  
         1,333,350 page-faults               #    0.003 M/sec                  
 1,421,230,219,856 cycles                    #    2.670 GHz                    
   928,444,569,746 stalled-cycles-frontend   #   65.33% frontend cycles idle   
   643,217,557,503 stalled-cycles-backend    #   45.26% backend  cycles idle   
 1,517,920,671,665 instructions              #    1.07  insns per cycle        
                                             #    0.61  stalled cycles per insn
   328,233,864,211 branches                  #  616.696 M/sec                  
     2,800,247,538 branch-misses             #    0.85% of all branches        

     615.223322417 seconds time elapsed

 Performance counter stats for './bench_boostdary':

     575427.533478 task-clock                #    0.873 CPUs utilized          
           160,863 context-switches          #    0.000 M/sec                  
             3,841 CPU-migrations            #    0.000 M/sec                  
         1,333,346 page-faults               #    0.002 M/sec                  
 1,536,548,826,172 cycles                    #    2.670 GHz                    
 1,304,937,894,945 stalled-cycles-frontend   #   84.93% frontend cycles idle   
   999,244,398,224 stalled-cycles-backend    #   65.03% backend  cycles idle   
   747,525,733,877 instructions              #    0.49  insns per cycle        
                                             #    1.75  stalled cycles per insn
   145,188,482,482 branches                  #  252.314 M/sec                  
     1,661,903,337 branch-misses             #    1.14% of all branches        

     659.415916004 seconds time elapsed

in a second run, i am comparing the cache misses:

 Performance counter stats for './bench_boostbheap':

    11,989,045,556 L1-dcache-load-misses                                        [33.34%]
        96,664,490 L1-dcache-store-misses                                       [50.00%]
     8,082,318,794 L1-dcache-prefetch-misses                                    [33.32%]
     5,457,166,960 LLC-load-misses                                              [49.99%]
        86,358,587 LLC-store-misses                                             [16.67%]
        72,903,925 LLC-prefetch-misses                                          [16.66%]

     602.784184115 seconds time elapsed

 Performance counter stats for './bench_boostdary':

    14,139,852,204 L1-dcache-load-misses                                        [33.33%]
        96,895,924 L1-dcache-store-misses                                       [50.00%]
       752,673,940 L1-dcache-prefetch-misses                                    [33.33%]
     6,081,358,271 LLC-load-misses                                              [49.99%]
        85,684,373 LLC-store-misses                                             [16.67%]
       106,694,619 LLC-prefetch-misses                                          [16.67%]

     652.591757245 seconds time elapsed

... now i tweaked the benchmark: 
* using 10 heaps instead of one but only run 5 times instead of 50 - increases
  cache thrashing
* remove work so that the heaps only contains floats (4byte) - more objects per
  cachline 
* 4 processes on the idle quad-core machine (2 for default stats, 4 for cache 
  misses events) - increases cache thrashing

results:
 Performance counter stats for './bench_boostbheap':

     651624.952248 task-clock                #    0.780 CPUs utilized          
           138,831 context-switches          #    0.000 M/sec                  
             2,543 CPU-migrations            #    0.000 M/sec                  
           674,172 page-faults               #    0.001 M/sec                  
 1,740,597,175,799 cycles                    #    2.671 GHz                    
 1,292,857,574,357 stalled-cycles-frontend   #   74.28% frontend cycles idle   
   831,566,179,360 stalled-cycles-backend    #   47.77% backend  cycles idle   
 1,441,574,215,357 instructions              #    0.83  insns per cycle        
                                             #    0.90  stalled cycles per insn
   328,096,551,218 branches                  #  503.505 M/sec                  
     1,152,407,287 branch-misses             #    0.35% of all branches        

     834.898435350 seconds time elapsed

Performance counter stats for './bench_boostdary':

     759395.182963 task-clock                #    0.815 CPUs utilized          
           156,451 context-switches          #    0.000 M/sec                  
             2,781 CPU-migrations            #    0.000 M/sec                  
           674,170 page-faults               #    0.001 M/sec                  
 2,028,502,096,477 cycles                    #    2.671 GHz                    
 1,806,658,397,323 stalled-cycles-frontend   #   89.06% frontend cycles idle   
   988,635,679,553 stalled-cycles-backend    #   48.74% backend  cycles idle   
   698,231,661,607 instructions              #    0.34  insns per cycle        
                                             #    2.59  stalled cycles per insn
   145,877,024,263 branches                  #  192.096 M/sec                  
       907,579,344 branch-misses             #    0.62% of all branches        

     931.525434663 seconds time elapsed

 Performance counter stats for './bench_boostbheap':

    23,480,288,668 L1-dcache-load-misses                                        [33.34%]
       264,061,426 L1-dcache-store-misses                                       [50.00%]
    14,720,355,800 L1-dcache-prefetch-misses                                    [33.32%]
     7,914,697,865 LLC-load-misses                                              [49.99%]
        46,737,807 LLC-store-misses                                             [16.67%]
        25,875,481 LLC-prefetch-misses                                          [16.67%]

     832.307436282 seconds time elapsed

 Performance counter stats for './bench_boostdary':

    27,020,681,775 L1-dcache-load-misses                                        [33.33%]
       326,151,939 L1-dcache-store-misses                                       [50.00%]
     3,795,316,756 L1-dcache-prefetch-misses                                    [33.33%]
    10,296,069,595 LLC-load-misses                                              [50.00%]
        45,933,195 LLC-store-misses                                             [16.67%]
        43,923,748 LLC-prefetch-misses                                          [16.67%]

     949.094734119 seconds time elapsed

:)

tim

Re: [boost] [Heaps] benchmarks

Tim Blechmann