
AMDG Christian Schladetsch wrote:
Hi Luke,
[...]
Luke> Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork.
Peak memory measured by an external tool is no good, as boost::pool and boost::fast_pool both leak memory. I need to be able to sample the memory used at certain times in the application in a cross-platform way. I'll get to this in due time.
I have been focused on getting the benchmark results more than attempting to do a complete analysis of their implications. The latest results are here http://tinyurl.com/lj6nab. I still can't explain why monotonic is faster at sorting a 500,000 element pre-reserved vector, but I have only reported the result and have not investigated deeply.
I have added mean, standard deviation, min and max factors for each of the small, medium, and large benchmark sets. I print a cumulative total at the end of each set, and a summary of all results at the end. These summaries are:
GCC: scheme mean std-dev min max fast 36.3 173 0.25 1.63e+03 pool 27.8 1.02e+04 0.857 897 std 1.69 0.91 0.333 5 tbb 1.59 0.849 0.333 5
MSVC: scheme mean std-dev min max fast 35.4 132 0.603 1.32e+003 pool 27.1 1.13e+004 0.693 878 std 2.7 1.7 0.628 7 tbb 1.44 0.727 0.291 6.4
The mean is the average speedup factor provided by monotonic allocation over the given scheme. So for MSVC, summarised over all tests, monotonic is 1.4X faster than TBB with a standard deviation of 0.7. TBB was 3.4X faster at its best and 6.4X slower at its worst.
Note that monotonic was on average 35X faster than boost::fast_pool, but notice too that the standard deviation is very high. At its worst, fast_pool was 1,300X slower than monotonic and at its best was 1.6X faster. boost::pool faired little better, with an even worse standard deviation of 10,000(!). One could argue that the tests are skewed, so I invite you to look at them and suggest any changes or additions. See http://tinyurl.com/l6vmgq for all the tests, and http://tinyurl.com/l89llqfor the test harness.
It is no surprise that TBB performs best across both platforms with the smallest standard deviation.
I'm not convinced that the average is meaningful. boost::fast_pool_allocator is not intended to be used with std::vector. You're averaging many cases for which it is documented to behave badly with a couple of cases for which it is fine. Also, even though pool_allocator is supposed to work with std::vector, it is slow as I would expect. The pool data structure is really designed for fixed size allocations and I for one am not particularly enamored of the idea of using it for std::vector. Also, this is completely unreleated, but for lines like this fast 1.49 0.838 0.603 1.32e+003 It looks like the cumulative min/max are being used instead of the local min/max. In Christ, Steven Watanabe