
Looking at your MSVC data it seems that the performance advantage of monotonic over other allocators diminishes as the benchmark size increases. I see this trend is most of the MSVC benchmarks. I'd like to see the benchmarks extended to see if monotonic becomes slower than the other allocators, as the trends suggest it might.
I have added three sections to the benchmark: Small: ~100 elements Medium: ~1,000 Large: ~1,000,000 The results are here for GCC and MSVC: http://tinyurl.com/lj6nab As one would expect, doing less work means getting it done faster, so in general monotonic is faster than the other allocation schemes measured. There are exceptions and other vagaries that are yet to be explained. Even though I've added a pool system, list_sort<int> is still faster with boost::fast_pool and TBB on MSVC. This is surprising, as list_create<int> is faster for monotonic. So this means that the sort itself - and not the allocation - is slower when using a 'monotonic data-structure'. I have spent some time investigating this and have yet to find a good answer. OTOH, vector_create and vector_sort are both faster with monotonic. This is also surprising for different reasons. Specifically, the results for vector_sort for large container size (500,000) is consistently 1.2x faster on MSVC and up to 2.9x faster on GCC with monotonic than the other schemes. Given that sorting 500,000 elements will swamp the cost of reserving the size for it, the question is.. Why? I don't really know. It may be because monotonic allocation uses the BSS first, but I cannot say with any certainty.
Have you considered that extending the buffer instead of reusing the buffer would lead to loss of cache performance because reuse of in cache memory will be faster than bringing in new memory all the time with cache misses?
It's a good point, and that is why I added monotonic::chain<> to the proposal. This is not part of the current benchmarks because I wanted to compare just the allocation schemes, and the data-structures they produce with std::containers.
If you are on a system with limited cache size this would be a more serious problem.
Indeed. Given that I am predominantly interested in efficiency and size, I am acutely aware of this. There is a case to be made for adding an introspection system to the debug build that warn the user about misuse. Consider for example the numbers for boost::fast_pool for the map_vector test, which are pathological.
Luke
Thanks for your thoughts again Luke. I am interested in what you think of the latest numbers that include large datasets, and especially the numbers for list_sort and vector_sort. As suggested earlier, I am yet to add measurements for total memory used (is there a portable way to measure this?). I also need to add some statistical analysis and summaries, and compare with google allocator (which I have been unable to find with a quick search). Any ideas for other tests to add would be appreciated as well. I also want to again raise the issue of boost::pool_allocator and boost::fast_pool_allocator leaking. AFAICT, when these are used with a system that rebind<>'s the allocator, it results in a persistent memory leak (unless you happen to know the new sizes of the allocations thence made). These allocators (on 1.38) also fail to compile when specialised over void on GCC. I'll supply a patch for that soon. Regards, Christian.