Re: [boost] Proposal: Monotonic Containers

15 Jun 2009

      On Tue, Jun 16, 2009 at 4:11 AM, Simonson, Lucanus J <
lucanus.j.simonson@intel.com> wrote:
...
Christian Schladetsch wrote:
...
[...] if you could find the time to look at the test in question, I would
be delighted to hear
alternative theories.
The memory overhead of header information on dynamically allocated data in
the heap could effectively reduce your apparent cahce size by causing fewer
total number of elements to fit in cache.
I have surmised this as well.

 This would lead to more cache misses without being due to spatial locality.
...
This could explain the entire difference in runtime, though I wouldn't go
so far as to say I'm sure that is the case.
However, for the test in point, there was no such overhead in any case, and
allocation was not being compared, as the container being tested was a fully
reserved vector in both cases:

        for (int i = 0; i < LOOP_COUNT; ++i)
            ++vec[1 + i % (ELEM_COUNT - 2)].ord;

The same principal is shown by the bubble-sort results I posted earlier.
Placing the storage for the bubble-sort test on the stack resulted in a
faster execution time. Placing *the same* monotonically-allocated structure
on the heap is slower. They were both faster than a std::allocation on the
heap.

The difference isn't much, but there is a difference, and it is not
negligable, and it is free if you have a known and fixed upper bound on
storage size. chain<> will go some way to assisting with structures that
span the stack/heap boundary and hence do not need a fixed upper bound.

In general, there are five factors at work that make monotonic allocation,
and containers that use it, faster and smaller:

1. Optional Stack Locality (memory access doesnt need the heap)
2. No memory-management size overhead (smaller structures + nodes are
adjacent)
3. Faster allocation (no list traveral or modification)
4. Faster deallocation (does nothing)
5. Free Thread Safety (when on the stack)

 Heap fragmentation is one of the things I was alluding to when I mentioned
...
the difference between toy benchmarks and real-world applications.
I shall re-post results that demonstrate how monotonic allocation helps with
this problem, as well as being faster in 'non-toy' situations, after I reply
here.
...
...
I haven't re-implemented std::containers. [...]
I didn't realize.  This is much better than what I thought you had.  In
fact I find myself liking the proposal more.  I think you are thinking in
the right direction to improve performance, and you remind me of some of my
colleagues in your enthusiasm and the types of ideas you come up with.
Thanks :) I fumbled the way in which I originally made the proposal, with my
nervous and defensive manner, and in how I attempted to communicate my ideas
at first. I was abrasive and rude, and I apologise for that to Ayrton and
Thorsten and others. Live and learn.

It is pleasing to see that with some persistence and tests I am becoming a
little less incoherent.
...
We have a data structure very similar to your "rope" or "chain" idea that
we use to allocate storage for a 2D spatial query data structure that is a
substitute for the R*Tree variant I mentioned before.  I like that you can
start with an array on the stack and then extend into the heap as needed.
 This is a nice feature for when the object turns out to be extremely small.
 I like that it is safe as opposed a fixed sized buffer.  Ours has a map
instead of linked list to access the nodes in the chain, so that we have log
n/chunksize access time.  The data structure is actually a substitute for
map, where insertion and removal time is not better than a map, but access
time is better and ite
 ration speed is much better.  It also conumes much less memory.  It
differs from a map also because random insertion/removal invalidates
iterators because we insert into the middle of each chunk.  If you only
insert at the end you obviously keep your iterators valid.
Interesting.

chain<> defaults to using a deque, not a list, for link management. this is
O(1).

Still, an allocator is no substitute for using the right data structures and
...
algorithms.
I agree. However, whichever data-structures and algorithms you use, they
will need storage and an allocator.
...
A map of int to list is a data structure that can be expected to perform
very poorly.  It isn't completely fair to do performance comparisions
against such a poor data structure.
This is true if I was proposing new data-structures. However, I am not
comparing performance of data-structures, so that is irrelevant. The
comparisons and results I am showing use *the exact same code and
data-structures* for both tests.

I am merely testing the execution speed of the code produced by a) using
std::allocator and b) using monotonic::allocator. In either case, the
data-structures and the code is identical and hence largely irrelevant.

Of course, one can write code that is designed to skew results one way or
the other, which I have endevoured to avoid. You can judge how well I have
done that for yourself.

Tests like test_dupe() however is designed to show that having a no-op
deallocation is indeed fast.
...
I think an allocator that uses your chain buffer for storage and is safe
could be a good thing.  I think that the allocator can be implemented in a
way that it doesn't need non-static data members.  The chain is in and of
itself not interesting as a replacement for deque, but as a memory buffer
for a good allocator I think it could find its niche.  I'd like to see the
memory buffer implement object deallocation and memory reuse, and perhaps
thread saftey (maybe optional thread safety enabled by default).
Stack-based containers are completely thread-safe by default with no extra
work and no locking required.

I have added:

monotonic::storage<N>               // renamed from
monotonic::inline_storage<N>
monotonic::shared_storage<N>   // for thread-safe heap-based storage

The nice thing about using monotonic::shared_storage is that you can use all
containers in a thread-safe way transparently and efficiently, without
needing to change the containers themselves.

This is a very pleasing result.

Regards,
Christian.