
On Tue, Jun 16, 2009 at 4:11 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:
Christian Schladetsch wrote:
[...] if you could find the time to look at the test in question, I would be delighted to hear alternative theories.
The memory overhead of header information on dynamically allocated data in the heap could effectively reduce your apparent cahce size by causing fewer total number of elements to fit in cache.
I have surmised this as well. This would lead to more cache misses without being due to spatial locality.
This could explain the entire difference in runtime, though I wouldn't go so far as to say I'm sure that is the case.
However, for the test in point, there was no such overhead in any case, and allocation was not being compared, as the container being tested was a fully reserved vector in both cases: for (int i = 0; i < LOOP_COUNT; ++i) ++vec[1 + i % (ELEM_COUNT - 2)].ord; The same principal is shown by the bubble-sort results I posted earlier. Placing the storage for the bubble-sort test on the stack resulted in a faster execution time. Placing *the same* monotonically-allocated structure on the heap is slower. They were both faster than a std::allocation on the heap. The difference isn't much, but there is a difference, and it is not negligable, and it is free if you have a known and fixed upper bound on storage size. chain<> will go some way to assisting with structures that span the stack/heap boundary and hence do not need a fixed upper bound. In general, there are five factors at work that make monotonic allocation, and containers that use it, faster and smaller: 1. Optional Stack Locality (memory access doesnt need the heap) 2. No memory-management size overhead (smaller structures + nodes are adjacent) 3. Faster allocation (no list traveral or modification) 4. Faster deallocation (does nothing) 5. Free Thread Safety (when on the stack) Heap fragmentation is one of the things I was alluding to when I mentioned
the difference between toy benchmarks and real-world applications.
I shall re-post results that demonstrate how monotonic allocation helps with this problem, as well as being faster in 'non-toy' situations, after I reply here.
I haven't re-implemented std::containers. [...] I didn't realize. This is much better than what I thought you had. In fact I find myself liking the proposal more. I think you are thinking in the right direction to improve performance, and you remind me of some of my
colleagues in your enthusiasm and the types of ideas you come up with.
Thanks :) I fumbled the way in which I originally made the proposal, with my nervous and defensive manner, and in how I attempted to communicate my ideas at first. I was abrasive and rude, and I apologise for that to Ayrton and Thorsten and others. Live and learn. It is pleasing to see that with some persistence and tests I am becoming a little less incoherent.
We have a data structure very similar to your "rope" or "chain" idea that we use to allocate storage for a 2D spatial query data structure that is a substitute for the R*Tree variant I mentioned before. I like that you can start with an array on the stack and then extend into the heap as needed. This is a nice feature for when the object turns out to be extremely small. I like that it is safe as opposed a fixed sized buffer. Ours has a map instead of linked list to access the nodes in the chain, so that we have log n/chunksize access time. The data structure is actually a substitute for map, where insertion and removal time is not better than a map, but access time is better and ite ration speed is much better. It also conumes much less memory. It differs from a map also because random insertion/removal invalidates iterators because we insert into the middle of each chunk. If you only insert at the end you obviously keep your iterators valid.
Interesting. chain<> defaults to using a deque, not a list, for link management. this is O(1). Still, an allocator is no substitute for using the right data structures and
algorithms.
I agree. However, whichever data-structures and algorithms you use, they will need storage and an allocator.
A map of int to list is a data structure that can be expected to perform very poorly. It isn't completely fair to do performance comparisions against such a poor data structure.
This is true if I was proposing new data-structures. However, I am not comparing performance of data-structures, so that is irrelevant. The comparisons and results I am showing use *the exact same code and data-structures* for both tests. I am merely testing the execution speed of the code produced by a) using std::allocator and b) using monotonic::allocator. In either case, the data-structures and the code is identical and hence largely irrelevant. Of course, one can write code that is designed to skew results one way or the other, which I have endevoured to avoid. You can judge how well I have done that for yourself. Tests like test_dupe() however is designed to show that having a no-op deallocation is indeed fast.
I think an allocator that uses your chain buffer for storage and is safe could be a good thing. I think that the allocator can be implemented in a way that it doesn't need non-static data members. The chain is in and of itself not interesting as a replacement for deque, but as a memory buffer for a good allocator I think it could find its niche. I'd like to see the memory buffer implement object deallocation and memory reuse, and perhaps thread saftey (maybe optional thread safety enabled by default).
Stack-based containers are completely thread-safe by default with no extra work and no locking required. I have added: monotonic::storage<N> // renamed from monotonic::inline_storage<N> monotonic::shared_storage<N> // for thread-safe heap-based storage The nice thing about using monotonic::shared_storage is that you can use all containers in a thread-safe way transparently and efficiently, without needing to change the containers themselves. This is a very pleasing result. Regards, Christian.