Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB

Updated tests (more of them, better formatting of results) for https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/compar... here http://tinyurl.com/m83vll for GCC and here http://tinyurl.com/n9g8jv for MSVC. I also added a column for std::allocator/mono.
Have added comparison against TBB, as requested. These results are at the same location as linked to above. Christian.

Christian Schladetsch wrote:
Updated tests (more of them, better formatting of results) for https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/compar... here http://tinyurl.com/m83vll for GCC and here http://tinyurl.com/n9g8jv for MSVC. I also added a column for std::allocator/mono.
Have added comparison against TBB, as requested. These results are at the same location as linked to above.
Note tbb is pretty much the fastest established allocator. It does reuse memory and deallocation is not a no-op, so it is more general than monotonic, which must be scoped and cannot be used when volume of memory allocations greatly exceeds the maximum ammount of memory actually in use at any given time. In windows, monotonic is almost a wash with tbb, whereas in linux the difference between montotinic and tbb is not that great and hard to evaluate because the measurements lack precision. The measurements shown are often only accurate to one significant digit, or don't register at all. Either get a more accurate timer or run larger benchmarks. Also, it is conventional to report speedup as a factor, rather than a percent. Instead of saying monotonic is 150% faster than std allocator you should report 1.5X as fast. This way instead of saying it is 1.9e+004% faster than fast_pool for thrash_pool_sort with length 510 you would say it is 190X faster, which just makes more sense all around. Also it makes more sense to say 0.95X as fast than 95% faster because it is more clear that values less than one are bad. People will trust benchmark results more if they are reported in a conventional manner. I'd suggest adding the google allocator to the list as well. You can expect it to perform about the same as tbb since that is what we have found in our testing. These benchmarks look much better than what you've shown previously, but I'd like you to also look for industry standard bencharks for allocators, or industry standard benchmarks that you can use to compare allocators. It is obvious to me that monotonic should not be used in certain ways it wasn't designed for, like for all allocations by a long running program without ever freeing or reusing any memory it has allocated, but it is not obvious from your benchmarks. I'd like you to also compare peak memory in the benchmark results and not just runtime, since this is often what people care about when they are optimizing. If you only have 256GB of memory and your can't afford to swap you find yourself implementing all kinds of things differently than if you have a terabyte. People need to understand the drawbacks of your allocator, not just the advantages, and benchmarks showing high peak memory and swapping when the allocator is used in a context it wasn't intended for should make that clear. Regards, Luke

Beside the use cases where allocations and deallocations are included, I am interested in the "cache effects," i.e., how the rather focused locus of a (small) monotonic block compares to other allocation schemes. So, if Christian could run some tests where the allocations are reserved (quite literally for vector, but one would have to prepopulate other containers when using the standard allocator) and only *accesses* to those elements are measured. We all understand that an increment of a pointer (possibly with proper alignment arithmetics) and a no-op deallocation is fast. No need to test or benchmark that per se, in my opinion. Let us instead look at the cache effects! /David On Jun 19, 2009, at 12:30 PM, Simonson, Lucanus J wrote:
Christian Schladetsch wrote:
Updated tests (more of them, better formatting of results) for https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/compar... here http://tinyurl.com/m83vll for GCC and here http://tinyurl.com/n9g8jv for MSVC. I also added a column for std::allocator/mono.
Have added comparison against TBB, as requested. These results are at the same location as linked to above.
Note tbb is pretty much the fastest established allocator. It does reuse memory and deallocation is not a no-op, so it is more general than monotonic, which must be scoped and cannot be used when volume of memory allocations greatly exceeds the maximum ammount of memory actually in use at any given time. In windows, monotonic is almost a wash with tbb, whereas in linux the difference between montotinic and tbb is not that great and hard to evaluate because the measurements lack precision.
The measurements shown are often only accurate to one significant digit, or don't register at all. Either get a more accurate timer or run larger benchmarks. Also, it is conventional to report speedup as a factor, rather than a percent. Instead of saying monotonic is 150% faster than std allocator you should report 1.5X as fast. This way instead of saying it is 1.9e+004% faster than fast_pool for thrash_pool_sort with length 510 you would say it is 190X faster, which just makes more sense all around. Also it makes more sense to say 0.95X as fast than 95% faster because it is more clear that values less than one are bad. People will trust benchmark results more if they are reported in a conventional manner.
I'd suggest adding the google allocator to the list as well. You can expect it to perform about the same as tbb since that is what we have found in our testing.
These benchmarks look much better than what you've shown previously, but I'd like you to also look for industry standard bencharks for allocators, or industry standard benchmarks that you can use to compare allocators. It is obvious to me that monotonic should not be used in certain ways it wasn't designed for, like for all allocations by a long running program without ever freeing or reusing any memory it has allocated, but it is not obvious from your benchmarks. I'd like you to also compare peak memory in the benchmark results and not just runtime, since this is often what people care about when they are optimizing. If you only have 256GB of memory and your can't afford to swap you find yourself implementing all kinds of things differently than if you have a terabyte. People need to understand the drawbacks of your allocator, not just the advantages, and benchmarks showing high peak memory and swapping when the allocator is used in a context it wasn't intended for shoul d make that clear.
Regards, Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

So, if Christian could run some tests where the allocations are reserved (quite literally for vector, but one would have to prepopulate other containers when using the standard allocator) and only *accesses* to those elements are measured.
Updated results for https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/compar... are here http://tinyurl.com/m83vll for GCC and here http://tinyurl.com/n9g8jv for MSVC. TBB is faster than monotonic in one test on Win32, sort_list, which is intriguing. Monotonic also loses sort_list against boost::fast_pool by ~10%, but wins others by multiple orders of magnitude. Boost::pool and ::fast_pool also require explicit releasing of memory (*if* you can actually do this, see note earlier about problems with rebind<> and boost::fast/pool), so there is a strong case to be made for monotonic over these. It is true that TBB and std:: are both general allocators, and hence are doing much more work than monotonic. However, that is also why monotonic is generally faster than either. Caveat: Monotonic allocation was designed for small, short-lived containers and works best in that case. While it does indeed scale, you will eventually exhaust memory if you repeatedly resize a vector say, or continually add and remove elements from a container using a monotonic allocator. Regards, Christian.

Christian Schladetsch wrote:
So, if Christian could run some tests where the allocations are reserved (quite literally for vector, but one would have to prepopulate other containers when using the standard allocator) and only *accesses* to those elements are measured.
Updated results for https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/compar... are here http://tinyurl.com/m83vll for GCC and here http://tinyurl.com/n9g8jv for MSVC.
TBB is faster than monotonic in one test on Win32, sort_list, which is intriguing.
Looking at your MSVC data it seems that the performance advantage of monotonic over other allocators diminishes as the benchmark size increases. I see this trend is most of the MSVC benchmarks. I'd like to see the benchmarks extended to see if monotonic becomes slower than the other allocators, as the trends suggest it might. Have you considered that extending the buffer instead of reusing the buffer would lead to loss of cache performance because reuse of in cache memory will be faster than bringing in new memory all the time with cache misses? If you are on a system with limited cache size this would be a more serious problem. Luke

Looking at your MSVC data it seems that the performance advantage of monotonic over other allocators diminishes as the benchmark size increases. I see this trend is most of the MSVC benchmarks. I'd like to see the benchmarks extended to see if monotonic becomes slower than the other allocators, as the trends suggest it might.
I have added three sections to the benchmark: Small: ~100 elements Medium: ~1,000 Large: ~1,000,000 The results are here for GCC and MSVC: http://tinyurl.com/lj6nab As one would expect, doing less work means getting it done faster, so in general monotonic is faster than the other allocation schemes measured. There are exceptions and other vagaries that are yet to be explained. Even though I've added a pool system, list_sort<int> is still faster with boost::fast_pool and TBB on MSVC. This is surprising, as list_create<int> is faster for monotonic. So this means that the sort itself - and not the allocation - is slower when using a 'monotonic data-structure'. I have spent some time investigating this and have yet to find a good answer. OTOH, vector_create and vector_sort are both faster with monotonic. This is also surprising for different reasons. Specifically, the results for vector_sort for large container size (500,000) is consistently 1.2x faster on MSVC and up to 2.9x faster on GCC with monotonic than the other schemes. Given that sorting 500,000 elements will swamp the cost of reserving the size for it, the question is.. Why? I don't really know. It may be because monotonic allocation uses the BSS first, but I cannot say with any certainty.
Have you considered that extending the buffer instead of reusing the buffer would lead to loss of cache performance because reuse of in cache memory will be faster than bringing in new memory all the time with cache misses?
It's a good point, and that is why I added monotonic::chain<> to the proposal. This is not part of the current benchmarks because I wanted to compare just the allocation schemes, and the data-structures they produce with std::containers.
If you are on a system with limited cache size this would be a more serious problem.
Indeed. Given that I am predominantly interested in efficiency and size, I am acutely aware of this. There is a case to be made for adding an introspection system to the debug build that warn the user about misuse. Consider for example the numbers for boost::fast_pool for the map_vector test, which are pathological.
Luke
Thanks for your thoughts again Luke. I am interested in what you think of the latest numbers that include large datasets, and especially the numbers for list_sort and vector_sort. As suggested earlier, I am yet to add measurements for total memory used (is there a portable way to measure this?). I also need to add some statistical analysis and summaries, and compare with google allocator (which I have been unable to find with a quick search). Any ideas for other tests to add would be appreciated as well. I also want to again raise the issue of boost::pool_allocator and boost::fast_pool_allocator leaking. AFAICT, when these are used with a system that rebind<>'s the allocator, it results in a persistent memory leak (unless you happen to know the new sizes of the allocations thence made). These allocators (on 1.38) also fail to compile when specialised over void on GCC. I'll supply a patch for that soon. Regards, Christian.

AMDG Christian Schladetsch wrote:
I also want to again raise the issue of boost::pool_allocator and boost::fast_pool_allocator leaking. AFAICT, when these are used with a system that rebind<>'s the allocator, it results in a persistent memory leak (unless you happen to know the new sizes of the allocations thence made).
I don't consider this an issue. All allocators with global state are going to have this problem. monotonic has a different form of the same problem. While in your test cases, it is safe to release the memory, in general, if you're using global storage it is not safe to release all the memory, because some other code might still be using it. I'll also point out that using singleton_pool::release_memory with fast_pool_allocator ends up violating the preconditions of pool::release_memory and therefore won't do any good. In addition, you are not even using the correct template parameters for the unrebound fast_pool_allocator when calling release_memory.
These allocators (on 1.38) also fail to compile when specialised over void on GCC. I'll supply a patch for that soon.
No need. https://svn.boost.org/trac/boost/changeset/49031 In Christ, Steven Watanabe

Christian Schladetsch wrote:
I also want to again raise the issue of boost::pool_allocator and boost::fast_pool_allocator leaking. [snip]
Steven> I don't consider this an issue. All allocators with global state are going to have this problem. monotonic has a different form of the same problem.
I do not think this is accurate.
While in your test cases, it is safe to release the memory, in general, if you're using global storage it is not safe to release all the memory, because some other code might still be using it.
But surely that should be left as a responsibility of the client code? At least monotonic doesn't leak by default. You can control the memory usage. With boost::pool_allocator or boost::fast_pool_allocator the user has no fair chance of avoiding a leak if either allocator was rebound. Even so, the user of these must explicitly release each differently-sized pool which must be problematic in general, surely? Regards, Christian.

AMDG Christian Schladetsch wrote:
Christian Schladetsch wrote:
I also want to again raise the issue of boost::pool_allocator and boost::fast_pool_allocator leaking. [snip]
Steven> I don't consider this an issue. All allocators with global state are going to have this problem. monotonic has a different form of the same problem.
I do not think this is accurate.
I didn't say it was exactly the same.
While in your test cases, it is safe to release the memory, in general, if you're using global storage it is not safe to release all the memory, because some other code might still be using it.
But surely that should be left as a responsibility of the client code? At least monotonic doesn't leak by default. You can control the memory usage.
With boost::pool_allocator or boost::fast_pool_allocator the user has no fair chance of avoiding a leak if either allocator was rebound. Even so, the user of these must explicitly release each differently-sized pool which must be problematic in general, surely?
The user cannot reliably release the memory allocated by monotonic, if monotonic is used simultaneously in multiple independent locations in a program. This means that only an application programmer can decide when it is safe to clean up and libraries cannot safely use monotonic::allocator without running the risk of allowing memory usage to balloon out of control. Further, I shudder to imagine the difficulty of making sure that this constraint is not violated in a large program. Okay, now that I've said this much, I realize that the problem is different. For monotonic, the memory must be released. The problem is that it is hard to do so safely. For pool_allocator and fast_pool_allocator, releasing the memory is not critical because they can reuse memory that has been deallocated. Anyway, if you want to release memory in this way, you should be using something like monotonic::local_storage, to which Boost.Pool has no equivalent. I'm rapidly concluding that using global storage with monotonic is much too fragile and dangerous. In Christ, Steven Watanabe

Steven Watanabe wrote:
The user cannot reliably release the memory allocated by monotonic, if monotonic is used simultaneously in multiple independent locations in a program. This means that only an application programmer can decide when it is safe to clean up and libraries cannot safely use monotonic::allocator without running the risk of allowing memory usage to balloon out of control. Further, I shudder to imagine the difficulty of making sure that this constraint is not violated in a large program. Okay, now that I've said this much, I realize that the problem is different. For monotonic, the memory must be released. The problem is that it is hard to do so safely. For pool_allocator and fast_pool_allocator, releasing the memory is not critical because they can reuse memory that has been deallocated.
Anyway, if you want to release memory in this way, you should be using something like monotonic::local_storage, to which Boost.Pool has no equivalent.
I'm rapidly concluding that using global storage with monotonic is much too fragile and dangerous.
I've been thinking about this too. I have two ideas. First, put a counter in monotonic that is incremented when allocations happen and decremented when deallocations happen. If the counter is ever zero you blow away all the buffers automatically. Second, you create a scoped object that encapsulates the idea of the monotonic storage's lifetime and delete the buffers when the object is destructed. Finally, you add a template parameter to monotonic to allow many programmers working in the same application to declare their own monotonic type that has separate buffers and lifetime from eachother. In this way managing the memory owned monotonic can be easy and safe. The only problem left is therefore misuse. I'm trying to understand the risk of misuse, and it looks like running out of memory is the primary risk. Perhaps I missed this part of the discussion, but why is it named monotonic? I would think something that describes its intended usage would be better. boost::scratch_memory, for example. Anyone who keeps objects allocated with an allocator called scratch_memory around for the lifetime of his program has some obvious explaining to do. I also think that a more general interface would be to allow the user to supply and address and size of memory to use as the initial scratch memory with the automatic (on by default) ability to grow beyond that buffer if needed with system allocation on the heap using the chain. Thread safty should be controlled by a template parameter and enabled by default. Luke

On Jun 22, 2009, at 6:27 PM, Simonson, Lucanus J wrote:
Perhaps I missed this part of the discussion, but why is it named monotonic?
The "next allocated object" pointer is always increased, monotonically? And, without measures such as the ones you suggested - having an allocation counter etc. - it will monotonically lead to a crashed system; just kidding, Christian! ;-)
I would think something that describes its intended usage would be better. boost::scratch_memory,
I like the 'scratch' part.
for example. Anyone who keeps objects allocated with an allocator called scratch_memory around for the lifetime of his program has some obvious explaining to do.
"But, it is part of Boost, so it must be safe, right?!" does not fly? ;-)
I also think that a more general interface would be to allow the user to supply and address and size of memory to use as the initial scratch memory with the automatic (on by default) ability to grow beyond that buffer if needed with system allocation on the heap using the chain. Thread safty should be controlled by a template parameter and enabled by default.
Great input, Luke. The wild ideas and over-simplified implementations of Christian's might crystallize with the help of people like you. I think I would prefer a solid non-chained realization first, though. If that proves stable and intriguing enough, we can consider "chains"? Or, is it totally meaningless without chained blocks? /David

David Bergman wrote:
Great input, Luke. The wild ideas and over-simplified implementations of Christian's might crystallize with the help of people like you. I think I would prefer a solid non-chained realization first, though. If that proves stable and intriguing enough, we can consider "chains"? Or, is it totally meaningless without chained blocks?
Let's keep things technical and not apply qualitative value judgements in a ways that might upset people. It is not totally meaningless without chained blocks, but it is hard to know how much memory your program will need ahead of time in all cases, and chained blocks provides some safety that I'm willing to pay for in added implementation complexity assuming it is properly tested. In particular, though, I would want the chaining feature to be performance neutral when not needed. Thread safety is a little tougher to prove correct implementation, and I'd want that to be closely inspected in a review by people more qualified than I am to judge whether we can trust it. Below is a code sketch of my idea for class design of the allocator and scoped buffer objects. I don't know if it is a good design, but there you have it. template <std::size_t block_size = DEFAULT_SIZE, bool auto_extend = true, bool thread_safe = true> class scratch_pad { public: scratch_pad(); //allocate initial buffer on the heap //use user provided buffer for initial buffer (does not transfer ownership) //buffer can come from the heap, stack or thread local storage of some kind scratch_pad(void* address, std::size_t byte_size); //free up any buffers allocated by scratch_pad on the heap ~scratch_pad(); //get a pointer to byte_count contiguous free bytes void* allocate(std::size_t byte_count); private: ... }; template <class unique_id, std::size_t block_size, bool auto_extend = true, bool thread_safe = true> class scratch_allocator { static scratch_pad<block_size, auto_exend, thread_safe>* buffer; public: static use_scratch_pad(scratch_pad& input_buffer) { buffer = &input_buffer; } //required interfaces for stdandard allocator ... } Usage: { class A {}; scratch_pad<A> spA; scratch_allocator<A>::use_scratch_pad(spA); std::vector<int, scratch_allocator<A> > vecA; do_something_performance_critical(vecA); } { int buff[1024]; scratch_pad<A> spA2(buff, 1024*size_of(int)); scratch_allocator<A>::use_scratch_pad(spA2); std::vector<int, scratch_allocator<A> > vecA2; do_something_performance_critical(vecA2); } These two uses of scratch_allocator<A> use different scratch pads which each go out of scope at the end of their block. One uses scratch buffer from the heap, the other from the stack. The typename A is just there to prevent anyone elses use of scratch_allocator elsewhere from clobbering this code. Another scratch_allocator<A> in the application would imply a syntax (or linktime) error for repeat definition of A. Regards, Luke

AMDG Simonson, Lucanus J wrote:
Steven Watanabe wrote:
I'm rapidly concluding that using global storage with monotonic is much too fragile and dangerous.
I've been thinking about this too. I have two ideas. First, put a counter in monotonic that is incremented when allocations happen and decremented when deallocations happen. If the counter is ever zero you blow away all the buffers automatically.
Even one long-lived object will make this fail. It doesn't even have to be really long lived. It just has to live long enough for a lot of allocations to be made. I like the following better.
Second, you create a scoped object that encapsulates the idea of the monotonic storage's lifetime and delete the buffers when the object is destructed.
How exactly would this work? Would it effectively create a stack of buffers?
Finally, you add a template parameter to monotonic to allow many programmers working in the same application to declare their own monotonic type that has separate buffers and lifetime from eachother.
This is better but still dangerous because any functions using this method will not be re-entrant.
In this way managing the memory owned monotonic can be easy and safe. The only problem left is therefore misuse. I'm trying to understand the risk of misuse, and it looks like running out of memory is the primary risk.
Perhaps I missed this part of the discussion, but why is it named monotonic? I would think something that describes its intended usage would be better. boost::scratch_memory, for example. Anyone who keeps objects allocated with an allocator called scratch_memory around for the lifetime of his program has some obvious explaining to do. I also think that a more general interface would be to allow the user to supply and address and size of memory to use as the initial scratch memory with the automatic (on by default) ability to grow beyond that buffer if needed with system allocation on the heap using the chain. Thread safty should be controlled by a template parameter and enabled by default.
In Christ, Steven Watanabe

Steven Watanabe wrote:
Finally, you add a template parameter to monotonic to allow many programmers working in the same application to declare their own monotonic type that has separate buffers and lifetime from eachother.
This is better but still dangerous because any functions using this method will not be re-entrant.
This is a larger problem. The express motivation for the allocator is to provide fast, local storage for multiple threads to use without interacting with eachother. This is in contradiction with the need to rely on static data members in a standard compliant allocator. For multiple threads to use the allocator without interacting (we would prefer not to pay for locks if we aren't actually sharing information between threads) the allocator must differ by type. The only way that can be accomplished is if every thread calls a different function with a diferent instantiation of the allocator. We don't want to instantiate dozens of copies of the same functions at compile time to support many threads. This is why Christian was so keen to have non-static data in the allocator. I can't see any around the problem, this is one reason why tbb and others implement their own containers to support threading. Luke

Simonson, Lucanus J wrote On Tuesday, June 23, 2009 11:38 AM
For multiple threads to use the allocator without interacting (we would prefer not to pay for locks if we aren't actually sharing information between threads) the allocator must differ by type. The only way that can be accomplished is if every thread calls a different function with a diferent instantiation of the allocator. We don't want to instantiate dozens of copies of the same functions at compile time to support many threads.
Only the storage need be unique in each case. Factor out the code that doesn't depend upon the tag type -- the behavior -- from that which does. That said, it would be painful to create unique tags for each thread. Is it a problem for the allocator to look in TLS for that thread's storage? _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On Wed, Jun 24, 2009 at 3:49 AM, Stewart, Robert <Robert.Stewart@sig.com>wrote:
Simonson, Lucanus J wrote On Tuesday, June 23, 2009 11:38 AM
For multiple threads to use the allocator without interacting (we would prefer not to pay for locks if we aren't actually sharing information between threads) the allocator must differ by type. The only way that can be accomplished is if every thread calls a different function with a diferent instantiation of the allocator. We don't want to instantiate dozens of copies of the same functions at compile time to support many threads.
Only the storage need be unique in each case. Factor out the code that doesn't depend upon the tag type -- the behavior -- from that which does.
That said, it would be painful to create unique tags for each thread. Is it a problem for the allocator to look in TLS for that thread's storage?
I have thought on this and refactored the allocator signature to deal with both of these issues: template < class T , class Region = default_region_tag , class Access = default_access_tag> struct allocator; The basic usage is: struct region0 {}; struct region1 {}; struct region2 {}; BOOST_AUTO_TEST_CASE(test_shared_allocation) { // use default region and access std::list<int, monotonic::allocator<int> > list; // use specific region and access std::list<int, monotonic::allocator<int, region0, monotonic::shared_access_tag> > list; std::list<int, monotonic::allocator<int, region0, monotonic::thread_local_access_tag> > list; // equivalently: std::list<int, monotonic::shared_allocator<int, region0> > list; std::list<int, monotonic::thread_local_allocator<int, region0> > list; // equivalently, using wrapped container types defined in monotonic/container/*.hpp: monotonic::list<int> list; monotonic::list<int, region0, monotonic::shared_access_tag> list; monotonic::list<int, region1, monotonic::thread_local_access_tag> list; // use different regions monotonic::map<int, monotonic::list<monotonic::string<region2>, region1>, region0> map; } See https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/ This deals with allocation from different regions based on a tag-type, and also allows the user to specifiy what access type to use. The remaining issue is that of local storage: BOOST_AUTO_TEST_CASE(test_local) { monotonic::local<region0> storage0; monotonic::local<region1> storage1; { std::list<int, monotonic::allocator<int, region0> > list0; std::list<int, monotonic::allocator<int, region1> > list1; fill_n(back_inserter(list0), 100, 42); fill_n(back_inserter(list1), 100, 42); std::basic_string<char, std::char_traits<char>, monotonic::allocator<char, region0> > str("foo"); str += "bar"; } } // region0 and region1 will be reset automatically Now, in this last case, it can be said that this is not re-entrant which is correct. However, there is a simple idiom that can be used: template <class Storage> void Reenter(Storage &store, size_t count) { if (count-- == 0) return; // use the Storage type to make new allocators typedef std::basic_string<char, std::char_traits<char>, typename Storage::template allocator<char>::type> String; std::list<String, typename Storage::template allocator<String>::type> list; /// ... use list and string types... they will use the storage correctly // can also use storage directly: char *bytes = store.allocate_bytes<123>(); string &str = store.create<string>("foo"); array<int, 300> &nums = store.create<array<int, 300> >(); store.destroy(str); Reenter(store, count); } struct my_local_region {}; void Start() { monotonic::local<my_local_region/*, access_type_tag*/> storage; Reenter(storage, 500); } // storage is reset on exit Of course, Start() is still not reentrant. It can't be, because it uses a static global. However, Reenter() is. It's the best that we can do. I did a test that used this idiom over using a normal allocator and thrashed a list of strings. The monotonic answer was twice the speed. Sure, it is more work, but twice the speed is twice the speed. I am interested to hear what people think of the new allocator model with tags for Region and Access, and the local model that is based on tags as well. Regards, Christian.

Hi Luke, [...] I have two ideas. First, put a counter in monotonic that is
incremented when allocations happen and decremented when deallocations happen. If the counter is ever zero you blow away all the buffers automatically.
This is a good idea; I'll add it. I currently keep track of allocation counts for informative reasons only for debug builds.
Second, you create a scoped object that encapsulates the idea of the monotonic storage's lifetime and delete the buffers when the object is destructed.
see https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/local.hpp. monotonic::local<> storage; { std::list<int, monotonic::allocator<int> > list; // uses local storage, not global } Finally, you add a template parameter to monotonic to allow many programmers
working in the same application to declare their own monotonic type that has separate buffers and lifetime from eachother.
see https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/region_all.... The nterface is: /// a monotonic region allocator uses a specified storage. /// Each region uses independent /// storage that may be used and reset. template <class T, size_t Region = 0> struct region_allocator; This is used like: typedef std::list<int, monotonic::region_allocator<int, 0> > List; typedef std::list<int, monotonic::region_allocator<int, 42> > OtherList; // .... monotonic::reset_region<42>(); etc. Basically, it allows you to have up to monotonic::DefaultSizes::MaxRegions of independantly controlled regions of storage. In this way managing the memory owned monotonic can be easy and safe. The
only problem left is therefore misuse. I'm trying to understand the risk of misuse, and it looks like running out of memory is the primary risk.
I mentioned earlier that it may be beneficial to add introspection to debug builds that warn about misuse, and I increasingly think it will be necessary.
Perhaps I missed this part of the discussion, but why is it named monotonic?
The name has been questioned earlier as well. The motivation for the name came from the fact that memory usage only ever increases - monotonically. I agree that the name is somewhat vague, especially considering that it can be used as a general stack+heap storage system, as well as an allocator.
I would think something that describes its intended usage would be better. boost::scratch_memory, for example. Anyone who keeps objects allocated with an allocator called scratch_memory around for the lifetime of his program has some obvious explaining to do.
I like this because it makes it clear that it is primarily for temporay use. It also works for the following: boost::scratch_memory::storage<> storage; { string &str = storage.create<string>("spam"); boost::array<Foo, 16> &foos = storage.create<boost::array<Foo, 16> >(); storage.destroy(str); storage.destroy(foos); } The usage is a bit clearer than boost::monotonic::storage<>
I also think that a more general interface would be to allow the user to supply and address and size of memory to use as the initial scratch memory with the automatic (on by default) ability to grow beyond that buffer if needed with system allocation on the heap using the chain.
This is a good idea and can be adopted. Currently, storage<StackSize, MinHeapIncrement> uses the stack first, then the heap. You can of course make it just on the stack with storage<N, 0> or just on the heap with storage<0, N>. However it could be made so that the first link in the chain is supplied by the user. See https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/storage.hp... .
Thread safty should be controlled by a template parameter and enabled by default.
Currently, thread-safety is controlled by the allocator type: monotonic::allocator<T> // single-threaded monotonic::shared_allocator<T> // multi-threaded via mutex monotonic::local_allocator<T> // uses thread-local-storage for the storage Regards, Christian

AMDG Christian Schladetsch wrote:
https://svn.boost.org/svn/boost/sandbox/monotonic/boost/monotonic/region_all.... The nterface is:
/// a monotonic region allocator uses a specified storage. /// Each region uses independent /// storage that may be used and reset. template <class T, size_t Region = 0> struct region_allocator;
This is used like:
typedef std::list<int, monotonic::region_allocator<int, 0> > List; typedef std::list<int, monotonic::region_allocator<int, 42> > OtherList; // .... monotonic::reset_region<42>();
etc. Basically, it allows you to have up to monotonic::DefaultSizes::MaxRegions of independantly controlled regions of storage.
Why do you need an integer? Can't you use a tag class? template<class T, class Tag = void> struct allocator; typedef std::list<int, monotonic::allocator<int, struct tag1> > List; In Christ, Steven Watanabe

Steven> The user cannot reliably release the memory allocated by monotonic, if monotonic is used simultaneously in multiple independent locations in a program.
This is true, and why I added regions.
This means that only an application programmer can decide when it is safe to clean up and libraries cannot safely use monotonic::allocator without running the risk of allowing memory usage to balloon out of control.
However, if you pass a container that uses a monotonic allocator to a library, you should know what that library will do with it. I also added chain<> to help with the most obvious problem of vector resizing, however this doesnt help in the general case. But even so, the user can control and limit the amount of memory used by using a region with a fixed-size storage. In general, I agree that there is a wide scope for misuse. There is a good case for adding an introspection system to debug builds that self-monitors and warns the user about probable problems. It can be argued that any system that needs this is too risky to use.
Further, I shudder to imagine the difficulty of making sure that this constraint is not violated in a large program.
Again, regions and local<> provide what is needed to manage this. Anyway, if you want to release memory in this way, you
should be using something like monotonic::local_storage, to which Boost.Pool has no equivalent.
See above. I'm rapidly concluding that using global storage with
monotonic is much too fragile and dangerous.
I agree that there are risks. Perhaps it can be renormalised so that the default allocator has to be supplied with a region and uses thread-safety by default? allocator<int, Region> default; // region is required, uses shared_storage allocator<int, Region, monotonic::global_tag> alloc; // uses single global storage allocator<int, Region, monotonic::local_tag> alloc; // uses thread local storage Would this alleviate some of your concerns?
In Christ, Steven Watanabe
Regards, Christian

On Mon, Jun 22, 2009 at 8:49 PM, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:
Steven> The user cannot reliably release the memory allocated by monotonic, if monotonic is used simultaneously in multiple independent locations in a program.
This is true, and why I added regions.
I think you should just not support some STL implementations. We should have interprocess STL implementation moved to a new namespace in boost. And have monotonic only supports C++0x allocators concept. This should make things simpler, more performant, and safer. With a big warning in docs. And examples using boost.interprocess containers. [snip] Regards, -- Felipe Magno de Almeida

On Tue, Jun 23, 2009 at 12:29 PM, Felipe Magno de Almeida < felipe.m.almeida@gmail.com> wrote:
I think you should just not support some STL implementations.
As it currently stands, monotonic is perfectly safe and compliant for all STL implementations. One problem, which Steven pointed out, is that different areas of the same program can cause issues if one releases global monotonic storage without the consent from or knowledge of another part. Equivalently, one can store an iterator into a vector V in one object, and another object can resize V, invalidating that iterator. Is this a fault of std::vector? Well, no. It is in the documentation that resizing a vector may invalidate any iterators. You also can't safely dereference a freed pointer. For monotonic, the problem is admittedly trickier because, in general, you *do* want to have different parts of the program using monotonic allocators and you don't necessarily want them to interact. This is one reason I was so stubborn about dropping the idea of: storage<> store; std::list<int, monotonic::allocator<int> > list(store); Over using a global store. But, I had to at the end, to ensure that it worked for STL implementations other than MSVC. There are currently two ways keep locality with the proposal as it stands. First, you can use a local scope: monotonic::local<> storage; { // do stuff using whatever monotonic::allocator<T> you wish; it wont affect // other parts of the application, and this is reentrant // however, here be dragons if you intermix with monotonic containers created // before this scope was entered (outside of MSVC). // you can also use the storage directly: string &s = storage.create<string>("foo"); array<int, 64> &nums = storage.create<array<int, 64> >(); storage.destroy(s); } // resources are freed when storage leaves scope Another way is to use regions, via monotonic::region_allocator<T, N>. This can be readily changed to be allocator<T,U> for some user-supplied tag-type U rather than a compile-time constant, as Steven has suggested. We should have interprocess STL implementation moved to a new
namespace in boost.
What we really need is a STL implementation that allows for stateful allocators, as Stepanov first envisaged. I have monotonic::extra::vector, list, map, set etc. These are just wrappers for std::containers that respect allocators correctly. It would be best to have a general set of containers that also respect stateful allocators; interprocess would benefit from this as well. However, there are deeper issues there than I am able or want to address right now.
And have monotonic only supports C++0x allocators concept.
It works as it stands, with local<> and regions providing tools necessary for safe isolation. Regards, Christian.

AMDG Christian Schladetsch wrote:
On Tue, Jun 23, 2009 at 12:29 PM, Felipe Magno de Almeida < felipe.m.almeida@gmail.com> wrote:
We should have interprocess STL implementation moved to a new namespace in boost.
What we really need is a STL implementation that allows for stateful allocators, as Stepanov first envisaged.
That's exactly what the containers in Boost.Interprocess do. In Christ, Steven Watanabe

On Wed, Jun 24, 2009 at 7:44 AM, Steven Watanabe <watanabesj@gmail.com>wrote:
AMDG
Christian Schladetsch wrote:
On Tue, Jun 23, 2009 at 12:29 PM, Felipe Magno de Almeida < felipe.m.almeida@gmail.com> wrote:
We should have interprocess STL implementation moved to a new namespace in boost.
What we really need is a STL implementation that allows for stateful allocators, as Stepanov first envisaged.
That's exactly what the containers in Boost.Interprocess do.
This would have been a useful piece of information to have when everyone was telling me that monotonic could not work because it had stateful allocators ;) I have redone a lot of this work in monotonic::container. Now that I have made monotonic::allocator stateless and using regions and access types, it will be trivial to re-envision my original goal of true (and first-class re-entrant) local storage by using stateful allocators and boost::interprocess containers: using interprocess::containers; { monotonic::local_storage<10*1024> store; { map<int, int, std::less<int>, monotonic::allocator<int> > local_map(store); } } May I recommend that boost::interprocess::containers be moved to boost::containers? The other thing I'd like is for map<> etc to ensure that new containers added to them use a rebound allocator. I do this for monotonic containers via a trait is_monotonic<Container>; this relied on allocator::construct to always be called before a new object is accessed. However, some STL implementations use placement new after only calling allocator::allocate, which is terrible practice and should be fixed. Can boost::interprocess::containers please be moved to boost::containers? At the same time, they can use a super-set of the std::allocator model and ensure that allocator::construct is always called rather than sometimes using allocator::constuct and othertimes using placement new. Regards, Christian

------- Original message -------
From: Christian Schladetsch <christian.schladetsch@gmail.com> To: boost@lists.boost.org Sent: 23.6.'09, 17:23
On Wed, Jun 24, 2009 at 7:44 AM, Steven Watanabe <watanabesj@gmail.com>wrote:
AMDG
Christian Schladetsch wrote:
On Tue, Jun 23, 2009 at 12:29 PM, Felipe Magno de Almeida < felipe.m.almeida@gmail.com> wrote:
We should have interprocess STL implementation moved to a new namespace in boost.
What we really need is a STL implementation that allows for stateful allocators, as Stepanov first envisaged.
That's exactly what the containers in Boost.Interprocess do.
This would have been a useful piece of information to have when everyone was telling me that monotonic could not work because it had stateful allocators ;) I have redone a lot of this work in monotonic::container.
Now that I have made monotonic::allocator stateless and using regions and access types, it will be trivial to re-envision my original goal of true (and first-class re-entrant) local storage by using stateful allocators and boost::interprocess containers:
That´s good. The C++03 allocators requiring stateless allocators is just not enough. I think it is better to have a new set of allocators to be used with proper containers using a new allocator concept.
May I recommend that boost::interprocess::containers be moved to boost::containers?
I recommend it too.
The other thing I'd like is for map<> etc to ensure that new containers added to them use a rebound allocator. I do this for monotonic containers via a trait is_monotonic<Container>;
I don't think you should do that. It is unintuitive IMHO. Elements inside the map are different beasts. If the user wants to use the same allocator. Then he shouldn't use operator[]. Which default-constructs the object.
this relied on allocator::construct to always be called before a new object is accessed. However, some STL implementations use placement new after only calling allocator::allocate, which is terrible practice and should be fixed.
We must not require more than the C++0x requires I think. Though I´m not aware how the new allocators are. [snip]
Regards, Christian

Christian> The other thing I'd like is for map<> etc to ensure that new
containers added to them use a rebound allocator. I do this for monotonic containers via a trait is_monotonic<Container>;
Felipe> I don't think you should do that. It is unintuitive IMHO. Elements inside the map are different beasts. If the user wants to use the same allocator. Then he shouldn't use operator[]. Which default-constructs the object.
Irrespective of that issue, surely there is a case to be made for normalising the use of allocator::construct? The fact that it is only sometimes used by the STL containers renders it practically useless. You currently can't do anything useful in allocator::construct, which seems to me to be counter to its original intended purpose. By ensuring that it is always called before an object is used, we can start to use it how it was first intended. Back to the idea of nested containers using the same allocator. I agree that there is a case to be made against automatically spreading an allocator down into contained types. monotonic currently solves this by using regions: struct my_region{}; std::map<int, std::list<int, monotonic::allocator<int, my_region>>, std::less<int>, monotonic::allocator<int, my_region> > map; and that works - both the map and the inner list will use the same storage and both can be defalt constructed. But it doesn't allow for truly local storage and doesnt use a stateful allocator. So, consider the following, which purports to use a statefull allocator with a container that norminally respects statefull allocators: boost::containers::list<int, my_custom_allocator<int> > list(stateful_allocator); generate_n(back_inserter(list), 100, rand); list.sort(); In this case, without at least ensuring that allocator::construct is called in list::sort when it creates temporary lists, the sort method will create lists that do not use a correct allocator. This is useless and contradicts the entire purpose of supporting statefull allocators in the first place. There are other specific areas where this is a problem as well, but it truly is a general problem. monotonic::containers attempt to solve this by dealing with it at the allocation level, which uses traits to determine when what is being constucted is in turn a monotonic container that uses the same allocator type and if so, passes itself as a construction parameter in allocator::construct. This means that default-constructed temporaries created by the parent container will indeed use a correct statefull allocator. But this can't work if the parent container simply calls allocator::allocate then uses placement new, or if the parent container creates default-constructed value_types on the stack. Ironically, if the container used its allocator correctly, it would use both ::allocate and ::construct, and the value_type may well be on the stack... In summary, sometimes STL containers use allocator::construct, sometimes they don't, which renders allocator::construct effectively and practically pointless. By not at least ensuring that allocator::construct is called before a contained object is used, the nominal support for statefull allocators in the proposed boost::containers is rendered useless, and I'm back to making my own. Regards, Christian

On Tue, Jun 23, 2009 at 7:14 PM, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:
 Christian> The other thing I'd like is for map<> etc to ensure that new
containers added to them use a rebound allocator. I do this for monotonic containers via a trait is_monotonic<Container>;
Felipe> I don't think you should do that. It is unintuitive IMHO. Elements inside the map are different beasts. If the user wants to use the same allocator. Then he shouldn't use operator[]. Which default-constructs the object.
Irrespective of that issue, surely there is a case to be made for normalising the use of allocator::construct?
I think so. But we should do so with C++0x in mind. I believe we want these newer allocators to work with the new C++0x containers right?
The fact that it is only sometimes used by the STL containers renders it practically useless. You currently can't do anything useful in allocator::construct, which seems to me to be counter to its original intended purpose.
Unfortunately I don't know what the original intended purpose was there for allocator::construct.
By ensuring that it is always called before an object is used, we can start to use it how it was first intended.
How is that?
Back to the idea of nested containers using the same allocator. I agree that there is a case to be made against automatically spreading an allocator down into contained types. monotonic currently solves this by using regions:
  struct my_region{};   std::map<int, std::list<int, monotonic::allocator<int, my_region>>, std::less<int>, monotonic::allocator<int, my_region> > map;
and that works - both the map and the inner list will use the same storage and both can be defalt constructed. But it doesn't allow for truly local storage and doesnt use a stateful allocator.
I don't really care for fast local-but-non-local allocators. They are too hard to prove correctness.
So, consider the following, which purports to use a statefull allocator with a container that norminally respects statefull allocators:
 boost::containers::list<int, my_custom_allocator<int> > list(stateful_allocator);  generate_n(back_inserter(list), 100, rand);  list.sort();
In this case, without at least ensuring that allocator::construct is called in list::sort when it creates temporary lists, the sort method will create lists that do not use a correct allocator.
I'm not sure I follow you. Where are the lists being created? All I see is integers being inserted in the list and a sort operation. This sort operation should only swap integers. And even if it did allocate something, it should do with my_custom_allocator. I don't get it why you need to get construct for list nodes with my_custom_allocator. Or am I missing something? I probably am.
This is useless and contradicts the entire purpose of supporting statefull allocators in the first place.
I don't see it.
There are other specific areas where this is a problem as well, but it truly is a general problem.
monotonic::containers attempt to solve this by dealing with it at the allocation level, which uses traits to determine when what is being constucted is in turn a monotonic container that uses the same allocator type and if so, passes itself as a construction parameter in allocator::construct. This means that default-constructed temporaries created by the parent container will indeed use a correct statefull allocator.
I'm completely lost.
But this can't work if the parent container simply calls allocator::allocate then uses placement new,
Why not? This should only not work when passing the same allocator down the inner containers. Which I advocate that it shouldn't do it automatically.
or if the parent container creates default-constructed value_types on the stack.
What is the problem with that?
Ironically, if the container used its allocator correctly, it would use both ::allocate and ::construct, and the value_type may well be on the stack...
I'm not sure construct should be used with value_types constructed outside the allocated area from the allocator. It seems wrong to me. But the allocator expert here is Ion. I hope he can shed some light.
In summary, sometimes STL containers use allocator::construct, sometimes they don't, which renders allocator::construct effectively and practically pointless.
If it is done randomly, I agree it makes construct pointless. But I'm not sure construct should be always called, and even if it is called. I'm pretty sure they shouldn't be used to pass automagically the allocator down with other containers. At least not for monotonic. It is understandable for shared memory though. Where we want containers inside other containers to work with shared memory.
By not at least ensuring that allocator::construct is called before a contained object is used, the nominal support for statefull allocators in the proposed boost::containers is rendered useless, and I'm back to making my own.
I don't know why.
Regards, Christian
Regards, -- Felipe Magno de Almeida

Christian> The fact that it is only sometimes used by the STL containers renders it practically useless. You currently can't do anything useful in allocator::construct, which seems to me to be counter to its original intended purpose.
Felipe> Unfortunately I don't know what the original intended purpose was there for allocator::construct.
Well, clearly it was intended as a hook for a custom allocator to do some work. Because it is not used consistently, this work (whatever that is), cannot be done.
I don't really care for fast local-but-non-local allocators. They are too hard to prove correctness.
Hmm, but I can prove that this works at least: monotonic::storage<> storage; { monotonic::vector<monotonic::map<int, monotonic::string<> > > vec(storage); vec.resize(1); vec[0][42] = "foo"; BOOST_ASSERT(vec.get_allocator().get_storage() == &storage); BOOST_ASSERT(vec[0].get_allocator().get_storage() == &storage); BOOST_ASSERT(vec[0][42].get_allocator().get_storage() == &storage); }
[...] without at least ensuring that allocator::construct is called in list::sort when it creates temporary lists, the sort method will create lists that do not use a correct allocator.
I'm not sure I follow you. Where are the lists being created? [...] All I see is integers being inserted in the list and a sort operation.
For instance, MSVC creates temporary lists in std::list::sort: const size_t _MAXBINS = 25; _Myt _Templist(this->_Alval), _Binlist[_MAXBINS + 1]; Note that the first temporary is made by at least passing the allocator - but the other lists are made on the stack using default construction, so there is no way to pass a statefull allocator as a parameter.
This sort operation should only swap integers. And even if it did allocate something, it should do with my_custom_allocator.
But it doesn't - that is my entire point.
I don't get it why you need to get construct for list nodes with my_custom_allocator. Or am I missing something? I probably am.
I am talking about entire lists here, not list nodes. One would like to assume that list nodes are *always* dealt with correctly.
This is useless and contradicts the entire purpose of supporting statefull allocators in the first place.
I don't see it.
I am sorry if I am unable to make myself clear. I am saying that if a container constructs instances of itself without using a rebound allocator, it breaks any notion of safe use of stateful allocators.
There are other specific areas where this is a problem as well, but it truly is a general problem.
monotonic::containers attempt to solve this by dealing with it at the allocation level, which uses traits to determine when what is being constucted is in turn a monotonic container that uses the same allocator type and if so, passes itself as a construction parameter in allocator::construct. This means that default-constructed temporaries created by the parent container will indeed use a correct statefull allocator.
I'm completely lost.
Perhaps my phrase "default-constructed temporaries" was misleading. What I meant was, a container that respects stateful allocators must not do this: typedef container<...> This; This temp; but rather do: This temp(get_allocator()); or: typename my_allocator::template rebind<This>::other this_alloc(get_allocator()); This *temp = this_alloc.allocate(1); this_alloc.construct(temp); but not this either: typename my_allocator::template rebind<This>::other this_alloc(get_allocator()); This *temp = this_alloc.allocate(1); new (temp) This();
But this can't work if the parent container simply calls allocator::allocate
then uses placement new,
Why not? This should only not work when passing the same allocator down the inner containers. Which I advocate that it shouldn't do it automatically.
I agree that allocators should not be propagated into nested containers by default.
Ironically, if the container
used its allocator correctly, it would use both ::allocate and ::construct, and the value_type may well be on the stack...
I'm not sure construct should be used with value_types constructed outside the allocated area from the allocator. It seems wrong to me. But the allocator expert here is Ion. I hope he can shed some light.
I am not suggesting that construct should be called before or instead of allocate. I am suggesting that construct should always be called after allocate and before the object is used. You can still allocate N objects and use M where M < N, but I suggest that construct should be called on each of the M objects before they are used and after they are allocated for. Failing to do so means that allocator::construct is practically useless.
In summary, sometimes STL containers use allocator::construct, sometimes they don't, which renders allocator::construct effectively and practically pointless.
If it is done randomly, I agree it makes construct pointless.
And it is, in general, because there is no requirement in the standard which states otherwise.
But I'm not sure construct should be always called, and even if it is called. I'm pretty sure they shouldn't be used to pass automagically the allocator down with other containers.
That is a seperate issue. What I do in construct is my own business ;) I'm just saying that with the current state of affairs, even boost::interprocess::containers do not work with stateful allocators. For example, interprocess::list is implemented using intrusive::list, which does the following in its sort method: list_impl counter[64]; This invalidates the use of stateful allocators, because it is making multiple temporary lists that do not use the allocator that the parent container was supplied with.
At least not for monotonic. It is understandable for shared memory though. Where we want containers inside other containers to work with shared memory.
// use shared storage in a custom region struct my_region {}; monotonic::map<int, monotonic::list<int, my_region, shared_access>, my_region, shared_access> map; This allows for multiple containers to use different regions of shared storage, and doesnt need stateful allocators. However: // use local storage monotonic::storage<> storage; monotonic::map<int, monotonic::list<int> > map(storage); map[42].push_back(123); In this case, I really do want the inner list to use the same (stateful) allocator that the map uses. I tried doing that first with std::containers, then with boost::interprocess::containers, but neither of them allow for safe use of statefull allocators, so I had to roll my own. This wasn't so bad however, because I wanted to change the type signatures anyway to make it easy to change the Region and Access types.
By not at least ensuring that allocator::construct is called before a contained object is used, the nominal support for statefull allocators in the proposed boost::containers is rendered useless, and I'm back to making my own.
I don't know why.
Because I need to ensure that if a container is using a stateful allocator, that container will respect it. Currently monotonic works stateless with std::containers and that is fine. It also works with stateful allocators (allowing for truly local storage on the stack), but only with monotonic::containers. It seems that there is no short-cut I can make by leveraging boost::interprocess containers. Regards, Christian.

AMDG Christian Schladetsch wrote:
I'm not sure I follow you. Where are the lists being created? [...] All I see is integers being inserted in the list and a sort operation.
For instance, MSVC creates temporary lists in std::list::sort:
const size_t _MAXBINS = 25; _Myt _Templist(this->_Alval), _Binlist[_MAXBINS + 1];
Note that the first temporary is made by at least passing the allocator - but the other lists are made on the stack using default construction, so there is no way to pass a statefull allocator as a parameter.
This is only a problem because the implementation of sort calls _Splice which copies the nodes if the allocators are unequal.
This sort operation should only swap integers. And even if it did allocate something, it should do with my_custom_allocator.
But it doesn't - that is my entire point.
sort should not allocate anything, so in theory it should not need to propagate the allocator.
But I'm not sure construct should be always called, and even if it is called. I'm pretty sure they shouldn't be used to pass automagically the allocator down with other containers.
That is a seperate issue. What I do in construct is my own business ;) I'm just saying that with the current state of affairs, even boost::interprocess::containers do not work with stateful allocators. For example, interprocess::list is implemented using intrusive::list, which does the following in its sort method:
list_impl counter[64];
This invalidates the use of stateful allocators, because it is making multiple temporary lists that do not use the allocator that the parent container was supplied with.
No it doesn't. Boost.Intrusive doesn't allocate anything. All that sort does is to rearrange existing nodes, so the allocator is not needed. Boost.Intrusive doesn't even know that an allocator exists. In Christ, Steven Watanabe

[snip]
Christian> This invalidates the use of stateful allocators, because it is
making multiple temporary lists that do not use the allocator that the parent container was supplied with.
Steven> No it doesn't. Boost.Intrusive doesn't allocate anything. All that sort does is to rearrange existing nodes, so the allocator is not needed. Boost.Intrusive doesn't even know that an allocator exists.
I stand corrected - I just saw that interprocess::list<> used the same algorithm as MSVC, including the temporary default-constructed lists (without passing the allocator). I did not follow it through to realise that unlike the MSVC implementation, Boost.Intrusive does not do any allocation. The following works on MSVC and GCC: monotonic::storage<> storage; { interprocess::list<int, monotonic::allocator<int> > list(storage); generate_n(back_inserter(list), 10, rand); list.sort(); } I'll happily base my containers on interprocess::containers rather than std::containers. What do people think of moving boost::interprocess::containers to boost::containers? These containers are not really specific to interprocess? Thanks, Christian.

On Tue, Jun 23, 2009 at 9:47 PM, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:
Christian> The fact that it is only sometimes used by the STL containers renders it practically useless. You currently can't do anything useful in allocator::construct, which seems to me to be counter to its original intended purpose.
Felipe> Unfortunately I don't know what the original intended purpose was there for allocator::construct.
Well, clearly it was intended as a hook for a custom allocator to do some work. Because it is not used consistently, this work (whatever that is), cannot be done.
OK. But what work was that envisioned?
I don't really care for fast local-but-non-local allocators. They are too hard to prove correctness.
Hmm, but I can prove that this works at least:
  monotonic::storage<> storage;   {     monotonic::vector<monotonic::map<int, monotonic::string<> > > vec(storage);     vec.resize(1);     vec[0][42] = "foo";     BOOST_ASSERT(vec.get_allocator().get_storage() == &storage);     BOOST_ASSERT(vec[0].get_allocator().get_storage() == &storage);     BOOST_ASSERT(vec[0][42].get_allocator().get_storage() == &storage);   }
Can you really? I've said local-but-non-local. This example uses a stateful-allocator-aware-container. You've snipped what I was answering to:
Back to the idea of nested containers using the same allocator. I agree that there is a case to be made against automatically spreading an allocator down into contained types. monotonic currently solves this by using regions:
struct my_region{}; std::map<int, std::list<int, monotonic::allocator<int, my_region>>, std::less<int>, monotonic::allocator<int, my_region> > map;
and that works - both the map and the inner list will use the same storage and both can be defalt constructed. But it doesn't allow for truly local storage and doesnt use a stateful allocator.
Here you're using C++03 containers and tags to create stateless allocators. Which have their own problems w.r.t. threading and scoping allocator lifetime. You must ensure you don't use inadvertedly the same tag somewhere else. And if you do, that the other is correct w.r.t threading and lifetime etc. This is just too much burden to reason about when all you want is scoped allocation.
[...] without at least ensuring that allocator::construct is called in list::sort when it creates temporary lists, the sort method will create lists that do not use a correct allocator.
I'm not sure I follow you. Where are the lists being created? [...] All I see is integers being inserted in the list and a sort operation.
For instance, MSVC creates temporary lists in std::list::sort:
      const size_t _MAXBINS = 25;       _Myt _Templist(this->_Alval), _Binlist[_MAXBINS + 1];
Note that the first temporary is made by at least passing the allocator - but the other lists are made on the stack using default construction, so there is no way to pass a statefull allocator as a parameter.
But MSVC list is not stateful-allocator-aware. I don't see how that can be used as an example. It doesn't have anything to do with construct. It has to do with different allocators instance always comparing equal. Which I'm arguing to not have any support at all.
This sort operation should only swap integers. And even if it did allocate something, it should do with my_custom_allocator.
But it doesn't - that is my entire point.
It doesn't in C++03 containers. That's not relevant. [snip]
Perhaps my phrase "default-constructed temporaries" was misleading. What I meant was, a container that respects stateful allocators must not do this:
  typedef container<...> This;   This temp;
Of course it shouldn't. But what it have to do with construct?
but rather do:
  This temp(get_allocator());
Correct. [snip]
I'm not sure construct should be used with value_types constructed outside the allocated area from the allocator. It seems wrong to me. But the allocator expert here is Ion. I hope he can shed some light.
I am not suggesting that construct should be called before or instead of allocate.
I didn't assume that.
I am suggesting that construct should always be called after allocate and before the object is used. You can still allocate N objects and use M where M < N, but I suggest that construct should be called on each of the M objects before they are used and after they are allocated for.
But you seem to be advocating for construct to be called on local value_types as well. Or am I wrong? [snip]
If it is done randomly, I agree it makes construct pointless.
And it is, in general, because there is no requirement in the standard which states otherwise.
Of course. But have you checked the allocators proposals on this issue for C++0x? That's what I think you should concentrate on before adding requirements to containers.
But I'm not sure construct should be always called, and even if it is called. I'm pretty sure they shouldn't be used to pass automagically the allocator down with other containers.
That is a seperate issue. What I do in construct is my own business ;) I'm just saying that with the current state of affairs, even boost::interprocess::containers do not work with stateful allocators.
I don't see how. You posted an example from MSVC STL to prove that interprocess containers don't work?
For example, interprocess::list is implemented using intrusive::list, which does the following in its sort method:
    list_impl counter[64];
This invalidates the use of stateful allocators, because it is making multiple temporary lists that do not use the allocator that the parent container was supplied with.
I've grepped my boost copy and couldn't find this line. Could you point it to me? And which boost version are you using?
At least not for monotonic. It is understandable for shared memory though. Where we want containers inside other containers to work with shared memory.
 // use shared storage in a custom region  struct my_region {};  monotonic::map<int, monotonic::list<int, my_region, shared_access>, my_region, shared_access> map;
This allows for multiple containers to use different regions of shared storage, and doesnt need stateful allocators. However:
Well, as I've explained before, I don't really care for stateless allocators for local storage since they are too error prone. I'm not sure how others feel about it. [snip]
Because I need to ensure that if a container is using a stateful allocator, that container will respect it.
Sure. I just don't know what it has to do with allocator::construct.
Currently monotonic works stateless with std::containers and that is fine.
I think it is too error prone.
It also works with stateful allocators (allowing for truly local storage on the stack), but only with monotonic::containers. It seems that there is no short-cut I can make by leveraging boost::interprocess containers.
Because you need construct? It doesn't seem to make sense to me. If you find a bug in interprocess w.r.t stateful allocators, then please report it.
Regards, Christian.
-- Felipe Magno de Almeida

AMDG Felipe Magno de Almeida wrote:
On Tue, Jun 23, 2009 at 9:47 PM, Christian Schladetsch<christian.schladetsch@gmail.com> wrote:
For example, interprocess::list is implemented using intrusive::list, which does the following in its sort method:
list_impl counter[64];
This invalidates the use of stateful allocators, because it is making multiple temporary lists that do not use the allocator that the parent container was supplied with.
I've grepped my boost copy and couldn't find this line. Could you point it to me? And which boost version are you using?
It's here and it's harmless. https://svn.boost.org/trac/boost/browser/trunk/boost/intrusive/list.hpp#L975 In Christ, Steven Watanabe

Well, clearly [allocator::construct] was intended as a hook for a custom allocator to do some work. Because it is not used consistently, this work (whatever that is), cannot be done.
OK. But what work was that envisioned?
I don't think that can be answered, but isn't that beside the point? If construct can only do placement-new, then it is pointless. For construct to do something more than just placement-new, then it has to be called every time before a newly allocated object is used for the first time.
I don't really care for fast local-but-non-local allocators. They are too hard to prove correctness.
Hmm, but I can prove that this works at least:
monotonic::storage<> storage; { monotonic::vector<monotonic::map<int, monotonic::string<> > > vec(storage); vec.resize(1); vec[0][42] = "foo"; BOOST_ASSERT(vec.get_allocator().get_storage() == &storage); BOOST_ASSERT(vec[0].get_allocator().get_storage() == &storage); BOOST_ASSERT(vec[0][42].get_allocator().get_storage() == &storage); }
Can you really? I've said local-but-non-local. This example uses a stateful-allocator-aware-container.
I have struggled to understand what you mean by "local-but-non-local". Do you mean that it is kinda local, but maybe not because other areas of the program could still access the storage by using the same region-tag-type? In this case, I don't think that is a design flaw. It may well be desirable to have different regions for different uses that should be available globally. For example, I can imagine using a distinct region for strings that is shared across the program. A different region for, say, a physics simulator, and a different region again for the deferred renderer and another, possiblly shared region for the network layer. Of course I am using a game system as an example here, but the same is true in general. In any case, there are many ways to ensure that the tag type is safely hidden; either in a namespace or as a private structure in a class type. If this isn't what you meant by 'local but not local', sorry but I don't follow.
You've snipped what I was answering to:
Back to the idea of nested containers using the same allocator. I agree that there is a case to be made against automatically spreading an allocator down into contained types. monotonic currently solves this by using regions:
struct my_region{}; std::map<int, std::list<int, monotonic::allocator<int, my_region>>, std::less<int>, monotonic::allocator<int, my_region> > map;
and that works - both the map and the inner list will use the same storage and both can be defalt constructed. But it doesn't allow for truly local storage and doesnt use a stateful allocator.
Here you're using C++03 containers and tags to create stateless allocators. Which have their own problems w.r.t. threading and scoping allocator lifetime.
Stateless allocators that use shared or thread-local storage won't have any problem with threading. I do not know what you mean by "scoping allocator lifetime" in the context of a stateless allocator.
You must ensure you don't use inadvertedly the same tag somewhere else. And if you do, that the other is correct w.r.t threading and lifetime etc.
As mentioned above, it is quite easy to ensure that region tags have the correct visibility. Plus, monotonic allocators take a second tag-type that specifies the access policy for the storage: either default (no locking), shared (guarded by a mutex), or thread-local. monotonic::allocator<int> global; monotonic::allocator<int, my_region> distinct; monotonic::allocator<int, my_region, shared_tag> distinct_guarded; monotonic::allocator<int, my_region, local_tag> distinct_thread_local; Each uses a distinct storage object. If you use a distinct_thread_local allocator in one place, and use another distinct_thread_local allocator somewhere else, then by definition you wish to use the same storage for both. I guess you are saying that it is possible to use them together inadvertently?
This is just too much burden to reason about when all you want is scoped allocation.
If by this you mean scoped storage, then I am confused. You can't really have scoped storage with a stateless allocator. I currently have two ways of using monotonic allocators. The first, and default, way, is stateless. These stateless allocators are delimited by regions, and may be shared or use thread-local storage. The second way is to make a storage object on the stack or the heap, and use that to initialise a container that uses a monotonic allocator. This is the statefull way of using it. This mode doesnt work with C++03 containers, but as it turns out will indeed work with interprocess::containers, iff I wrap them carefully. [snip side-track about list::sort. as pointed out by Steven, the temporary
lists made by interprocess::list::sort are benign]
But you seem to be advocating for construct to be called on local value_types as well. Or am I wrong?
No, I just want a container system that always calls allocator::construct before using an allocated object. I want this because I'd like to do something in allocator::construct other than placement new.
Of course. But have you checked the allocators proposals on this issue for C++0x? That's what I think you should concentrate on before adding requirements to containers.
I'll read it again.
Because I need to ensure that if a container is using a stateful allocator, that container will respect it.
Sure. I just don't know what it has to do with allocator::construct.
I was just making the case that since construct is not used uniformly, it cannot be used to propagate stateful allocators down into nested containers. I fully understand that this is not something that you would always want to do, but it could be done if construct was used uniformly.
Currently monotonic works stateless with std::containers and that is fine.
I think it is too error prone.
I'd like to address any concerns you have. How could it be misused? By inadvertently sharing region tags? I really don't think that is an issue.
It also works with stateful allocators (allowing for truly local storage on the stack), but only with monotonic::containers. It seems that there is no short-cut I can make by leveraging boost::interprocess containers.
Because you need construct? It doesn't seem to make sense to me.
I need construct to be called so that nested containers have a chance to receive the correct stateful allocator.
If you find a bug in interprocess w.r.t stateful allocators, then please report it.
I'm sticking with a thin wrapping of interprocess containers. I just got spooked by the temporaries I saw, but they are not a problem because as Steven pointed out those temporaries do not do allocation. It is a problem in other STL implementations however. Regards, Christian.

Christian Schladetsch wrote:
There are exceptions and other vagaries that are yet to be explained. Even though I've added a pool system, list_sort<int> is still faster with boost::fast_pool and TBB on MSVC. This is surprising, as list_create<int> is faster for monotonic. So this means that the sort itself - and not the allocation - is slower when using a 'monotonic data-structure'. I have spent some time investigating this and have yet to find a good answer.
My guess is that the algorithm is allocating and deallocating in an n^2 loop. By using n^2 memory for O(n) elements you end up with elements spread throughout a large address range, though weighted more toward the recently allocated addresses. Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork. There also exist memory profiling tools. In linux you can poll /proc to get memory usage. There are little linux utilities out there called memtime and such that work this way. I'd like to suggest another benchmark. This is an example of something I think would be very bad for your allocator. Build a std::map of size n. Loop for O(n^2) iterations. In each iteration insert one random element and lookup with lower_bound one random element and remove it. Precompute the random numbers and don't include the rand() calls in the time measurement of the benchmark. Your allocator will use O(n^2) memory since it never recovers deallocated memory, so if you set n close to the size of the L1 or L2 cache you should see dramatic performance loss. This is my typical use case for a std::map. Regards, Luke

Hi Luke, [...]
Luke> Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork.
Peak memory measured by an external tool is no good, as boost::pool and boost::fast_pool both leak memory. I need to be able to sample the memory used at certain times in the application in a cross-platform way. I'll get to this in due time. I have been focused on getting the benchmark results more than attempting to do a complete analysis of their implications. The latest results are here http://tinyurl.com/lj6nab. I still can't explain why monotonic is faster at sorting a 500,000 element pre-reserved vector, but I have only reported the result and have not investigated deeply. I have added mean, standard deviation, min and max factors for each of the small, medium, and large benchmark sets. I print a cumulative total at the end of each set, and a summary of all results at the end. These summaries are: GCC: scheme mean std-dev min max fast 36.3 173 0.25 1.63e+03 pool 27.8 1.02e+04 0.857 897 std 1.69 0.91 0.333 5 tbb 1.59 0.849 0.333 5 MSVC: scheme mean std-dev min max fast 35.4 132 0.603 1.32e+003 pool 27.1 1.13e+004 0.693 878 std 2.7 1.7 0.628 7 tbb 1.44 0.727 0.291 6.4 The mean is the average speedup factor provided by monotonic allocation over the given scheme. So for MSVC, summarised over all tests, monotonic is 1.4X faster than TBB with a standard deviation of 0.7. TBB was 3.4X faster at its best and 6.4X slower at its worst. Note that monotonic was on average 35X faster than boost::fast_pool, but notice too that the standard deviation is very high. At its worst, fast_pool was 1,300X slower than monotonic and at its best was 1.6X faster. boost::pool faired little better, with an even worse standard deviation of 10,000(!). One could argue that the tests are skewed, so I invite you to look at them and suggest any changes or additions. See http://tinyurl.com/l6vmgq for all the tests, and http://tinyurl.com/l89llqfor the test harness. It is no surprise that TBB performs best across both platforms with the smallest standard deviation. I have VTune but I am yet to apply it. I have been attempting to gather the raw data from which more analysis can be made. There also exist memory profiling tools. In linux you can poll /proc to get
memory usage. There are little linux utilities out there called memtime and such that work this way.
These can't be used for reasons mentioned above.
I'd like to suggest another benchmark. This is an example of something I think would be very bad for your allocator. Build a std::map of size n. Loop for O(n^2) iterations. In each iteration insert one random element and lookup with lower_bound one random element and remove it. Precompute the random numbers and don't include the rand() calls in the time measurement of the benchmark.
I have done so. The code is here http://tinyurl.com/l6vmgq in a structure called "test_map_erase", the results are here http://tinyurl.com/mp6sub. For your test, monotonic was at least as fast if not faster than the other schemes. Of course, for large N memory will be exhausted. But for N=5000, the summary is: scheme mean std-dev min max fast 0.996 0.0307 0.957 1.04 pool 1.2 0.00107 1.16 1.25 std 1.44 0.025 1.41 1.48 tbb 1.06 0.0228 1.03 1.1
Your allocator will use O(n^2) memory since it never recovers deallocated memory, so if you set n close to the size of the L1 or L2 cache you should see dramatic performance loss. This is my typical use case for a std::map.
This is a common enough idiom, yes. But then again, another common idiom is to first populate maps then use them for fast lookup, like a dictionary. I do not suggest, nor have I ever suggested, that monotonic is a general-purpose allocator that is a drop-in replacement for std::allocator in all cases. I have repeatedly stated the intended limititation of monotonic allocation, including mentioning that it is not designed for continual erasure from and insertion into containers. But it does what it was designed for very well, which is to produce fast and small datastructures with very fast allocation. Your test in this case was specifically designed to be pathological for monotonic allocation. Even so, the above figures show that it still stands up very well against other schemes. I exhausted memory for N=100,000, but it didn't t help that boost::pool leaks it for this test either.
Regards, Luke
Thanks for your suggestions and comments Luke. I will be adding more benchmarks soon, to test this ability of monotonic as well: { length = 4; // storage starts on the stack (in this case, 10k of it), then merges into the heap as needed monotonic::storage<10*1024> storage; for (size_t n = 0; n < length; ++n) { // create a new int from storage int &n0 = storage.create<int>(); // create a new string (uses correct alignment) string const &s1 = storage.create<string>("foo"); BOOST_ASSERT(s1 == "foo"); // allocate 37 bytes with alignment 1 char *array0 = storage.allocate_bytes(37); fill_n(array0, 37, 42); // allocate 2537 bytes with 64-byte alignment char *array1 = storage.allocate_bytes(2537, 64); fill_n(array1, 2537, 123); // allocate 1283 bytes with optimal machine alignment char *array2 = storage.allocate_bytes<1283>(); fill_n(array2, 1283, 42); array<Unaligned, 42> &array3 = storage.create<array<Unaligned, 42> >(); // destroy objects. this only calls the destructors; it does not release memory storage.destroy(s1); cout << "storage.stack, heap used: " << storage.fixed_used() << ", " << storage.heap_used() << endl; } // storage is released. if this was only ever on the stack, no work is done } //>>> storage.stack, heap used: 4882, 0 storage.stack, heap used: 9042, 0 storage.stack, heap used: 9298, 3827 storage.stack, heap used: 9554, 7667 This is harder to test, as no other allocation model that I am currently testing can do this. But I'll try to test against auto_buffer<>, boost::pool and boost::object_pool and see how it goes. I am interested in your thoughts on the latest results, and/or the methodology I used. Regards, Christian.

Christian Schladetsch wrote:
Hi Luke,
[...]
Luke> Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork.
Peak memory measured by an external tool is no good, as boost::pool and boost::fast_pool both leak memory. I need to be able to sample the memory used at certain times in the application in a cross-platform way. I'll get to this in due time.
Have a look at VTune and perfmon/pfmon: http://lwn.net/Articles/303967/ http://perfmon2.sourceforge.net/pfmon_usersguide.html You'll be able to sample the performance counters of the CPU. Cheers, Milosz

Christian Schladetsch wrote:
I do not suggest, nor have I ever suggested, that monotonic is a general-purpose allocator that is a drop-in replacement for std::allocator in all cases. I have repeatedly stated the intended limititation of monotonic allocation, including mentioning that it is not designed for continual erasure from and insertion into containers. But it does what it was designed for very well, which is to produce fast and small datastructures with very fast allocation.
Yes, I understand your intention. You want to be able to use a scratchpad memory with standard containers. I think the idea has merit. I want to understand what happens when the allocator is misapplied. Clearly if people use it as it was intended to be used they should get the benefits you intended them to get. I want to know what happens to everyone else. It looks like my theory about cache and working set size is not borne out by your testing. Luke

AMDG Christian Schladetsch wrote:
Hi Luke,
[...]
Luke> Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork.
Peak memory measured by an external tool is no good, as boost::pool and boost::fast_pool both leak memory. I need to be able to sample the memory used at certain times in the application in a cross-platform way. I'll get to this in due time.
I have been focused on getting the benchmark results more than attempting to do a complete analysis of their implications. The latest results are here http://tinyurl.com/lj6nab. I still can't explain why monotonic is faster at sorting a 500,000 element pre-reserved vector, but I have only reported the result and have not investigated deeply.
I have added mean, standard deviation, min and max factors for each of the small, medium, and large benchmark sets. I print a cumulative total at the end of each set, and a summary of all results at the end. These summaries are:
GCC: scheme mean std-dev min max fast 36.3 173 0.25 1.63e+03 pool 27.8 1.02e+04 0.857 897 std 1.69 0.91 0.333 5 tbb 1.59 0.849 0.333 5
MSVC: scheme mean std-dev min max fast 35.4 132 0.603 1.32e+003 pool 27.1 1.13e+004 0.693 878 std 2.7 1.7 0.628 7 tbb 1.44 0.727 0.291 6.4
The mean is the average speedup factor provided by monotonic allocation over the given scheme. So for MSVC, summarised over all tests, monotonic is 1.4X faster than TBB with a standard deviation of 0.7. TBB was 3.4X faster at its best and 6.4X slower at its worst.
Note that monotonic was on average 35X faster than boost::fast_pool, but notice too that the standard deviation is very high. At its worst, fast_pool was 1,300X slower than monotonic and at its best was 1.6X faster. boost::pool faired little better, with an even worse standard deviation of 10,000(!). One could argue that the tests are skewed, so I invite you to look at them and suggest any changes or additions. See http://tinyurl.com/l6vmgq for all the tests, and http://tinyurl.com/l89llqfor the test harness.
It is no surprise that TBB performs best across both platforms with the smallest standard deviation.
I'm not convinced that the average is meaningful. boost::fast_pool_allocator is not intended to be used with std::vector. You're averaging many cases for which it is documented to behave badly with a couple of cases for which it is fine. Also, even though pool_allocator is supposed to work with std::vector, it is slow as I would expect. The pool data structure is really designed for fixed size allocations and I for one am not particularly enamored of the idea of using it for std::vector. Also, this is completely unreleated, but for lines like this fast 1.49 0.838 0.603 1.32e+003 It looks like the cumulative min/max are being used instead of the local min/max. In Christ, Steven Watanabe

Steven> I'm not convinced that the average is meaningful.
It's not, that's why I included the standard deviation and min/max as well.
boost::fast_pool_allocator is not intended to be used with std::vector. You're averaging many cases for which it is documented to behave badly with a couple of cases for which it is fine.
That's correct. I was asked to compare with boost::pool_allocator, boost::fast_pool_allocator, std::allocator and tbb::allocator.
Also, even though pool_allocator is supposed to work with std::vector, it is slow as I would expect. The pool data structure is really designed for fixed size allocations and I for one am not particularly enamored of the idea of using it for std::vector.
Also, this is completely unreleated, but for lines like this
fast 1.49 0.838 0.603 1.32e+003
It looks like the cumulative min/max are being used instead of the local min/max.
That's also correct and I noticed this as well. I'll change it so that the test summaries show min/max just for that test, and keep the overall the global min/max.
In Christ, Steven Watanabe
Regards, Christian.

AMDG Christian Schladetsch wrote:
Steven> I'm not convinced that the average is meaningful.
It's not, that's why I included the standard deviation and min/max as well.
boost::fast_pool_allocator is not intended to be used with std::vector. You're averaging many cases for which it is documented to behave badly with a couple of cases for which it is fine.
That's correct. I was asked to compare with boost::pool_allocator, boost::fast_pool_allocator, std::allocator and tbb::allocator.
The individual tests are okay, but I don't think that it's particularly meaningful to average these very different tests even if you include the standard deviation. I doubt that they are anywhere close a resembling a random sample of actual usage. In Christ, Steven Watanabe

Steven> The individual tests are okay, but I don't think that it's
particularly meaningful to average these very different
tests even if you include the standard deviation.
I understand that they are very different tests, and that they mix node- with sequence-based allocation. However, that was the actual point of the benchmarks. std:: and tbb:: do very well against monotonic, considering that they actually also do de-allocation; I didn't cherry-pick the tests. Even so, both std and tbb are at least ~50% slower across the board, and that is siginifcant. The fact that both std:: and tbb:: are quite consistent across both MSVC and GCC, and both with a low standard deviation compared to monotonic, speaks somewhat about the strength and meaning of the benchmark itself. Monotonic allocation is 2-4X faster than std::allocator on average for these benchmarks on MSVC, and 1.4X faster than TBB on average, which will be important to those that want fast code on windows or 360. The spread for GCC is also good with monotonic being fractionally faster again over TBB, but here TBB isn't hugely better than std::allocator either. I realise that at the end of the day, allocation is not a specific bottle-neck in general. However, for small and temporary node-based containers it take be a large chunk of your execution time. This is especially true for real-time systems that have fixed high-water marks (such as embedded systems, physics simulators and deferred renderers). Many developers spend a lot of effort trying to shave 5-10% off of their execution speed, after they have decided on the data-structures and usage patterns. One motivation for monotonic was to help them with this. Another, related, usage is to provide a temporary, general and *fast* storage area that can start on the stack and can grow to the heap (and no, you don't pay for that flexibility if you stay within the initial limit). In some respects, a monotonic::storage<> instance is like a first-class scope object.
I doubt that [the tests] are anywhere close [to] resembling a random sample of actual usage.
I am open to suggestions on what other tests should be added, or how the existing tests may be changed. I do supply a mean, and std-dev for each test, as well as a summary at the end. I realise that the summary was skewed for boost::fast_pool because it included the pathological case of map<int, vector<int> > for it. I seriously considered removing this test, on the basis that maybe it was 'unfair' to boost::fast_pool_allocator. But if it was to be a test of each of the *allocator* types, in typical usage scenarios, it had to stay in. I also didn't remove tests that monotonic did not win. I picked some pretty basic usage patterns that I think do, in fact, somewhat resemble a random sample of typical usage. Constructing a list, constructing a vector. Duplicating a list, duplicating a vector. Sorting a list, sorting a vector. Using a set of vectors and a map of ints to vectors (which also covers strings somewhat). I could've made tests that I knew monotonic would win hands down, but I avoided doing that. There were other tests that I could have done that I couldn't do because monotonic would exhaust memory. One of these would be something like a spinning resize on a vector. monotonic::chain<> addresses this specific issue (not to be confused with the 'chain' that is used for heap-based storage), but because it was to be a test using std::containers, I had to leave that test out. A lot of similar third-party benchmarks just test the allocation speed. boost::fast_pool is tremendously good at this for same-sized allocations. Monotonic is even faster, and it is much faster again for random-sized allocations. But rather than just test allocation speed, I specifically wanted to benchmark both allocation times *and* the performance of the structures created by the different allocators, including node- and sequence-based containers, and hybrids, for a wide range of sizes (from 1 to 10,000,000). This is why, for example, I put in list creation (which monotonic wins) along with list sort, which monotonic does not win over boost::fast_pool_allocator. If I had just left it at allocation, monotonic would win. But again, I wanted to test the performance of the created data structures - in the case of sorting a large list, monotonic does not win. But then again, it wins at sorting a large vector which one can argue is more common and important. Anyway, tl;dr. In Christ,
Steven Watanabe
Regards, Christian
participants (7)
-
Christian Schladetsch
-
David Bergman
-
Felipe Magno de Almeida
-
Milosz Marian HULBOJ
-
Simonson, Lucanus J
-
Steven Watanabe
-
Stewart, Robert