Re: [boost] Proposal: Monotonic Containers

11 Jun 2009

      I do not intend to argue about the difference between cache coherency and
spatial locality; suffice to say they are related.

If you have a set of nodes for a map scattered over a heap, and traverse
them, you will be generally hitting different cache lines. If you have a set
of nodes for a map on the stack, and traverse them, you will generally be
hitting fewer cache lines.

This is a platform-independant fact of all modern hardware.

Now, if you have an unordered_map<string, list<Foo> >, and that is on the
stack using a monotonic::allocator, using this data-structure will be much
faster than using the exact same structure with a std::allocator.

Why? Because the first one is not doing *any memory management*. Despite
claims otherwise, I challenge anyone to show me where the proposed system
does a single memory allocation, or a memory de-allocation.

All the storage is in a boost::array<char, N>. That is the allocation
buffer. It is allocated by the compiler. It is on an aligned boundary. It is
probably on the stack. I am just returning newly constructed objects from
within that buffer. Should I align objects within that buffer? Sure, I've
already done it, it's a trivial change. But its still not an allocation, it
is a re-interpretation of space within an already-allocated buffer.

So, the first reason why my data-structures are much faster than using the
std:: ones is that they do no memory allocation. No new, no delete.
"Allocation" is simply incrementing a pointer. "Deallocation" does nothing
at all.

The second reason why containers using monotonic::allocator are faster than
otherwise is that they can use the stack for storage of std:: containers.
This means that you can put a map<string, list<map<int, vector<foo>>>> on
the stack. All the nodes and the pointers within those nodes are also on the
stack.

This means that traversing that structure is fast, and going from the
vector<foo> inside to the outer map<string, ..> is fast, as the values for
both will be stored adjacent to each other on the stack. Make a new
structure using the same storage, somewhere down a call-chain perhaps, and
it will play very nicely with the existing ones. I have a set of
motivational examples in the documentation.

So, the three things that are different about this proposal is that 1. there
is no memory management and 2. the stack can be used to store the nodes in a
map, set or list (or unordered_map, etc). 3. no memory fragmentation.

The main compelling reason to use a monotonic::vector<T> over a
std::vector<T> is that the former will not fragment memory, but the second
one will. The downside of couse is that resizing a monotonic::vector is very
wasteful, as the memory for the old vector is not freed. That is why, for
example, the following fails:

monotonic::storage<100*sizeof(int)> storage; //< on the stack! not on the
heap, as others have suggested
monotonic::vector<int> v(storage);
for (int n = 0; n < 100; ++n)
     v.push_back(n); // will fail at n = 34 with an exhausted storage

but this succeeds:
monotonic::storage<100*sizeof(int)> storage;
monotonic::vector<int> v(100, storage); // create with an initial size and
storage
for (int n = 0; n < 100; ++n)
    v[n] = n;

As the vector in the first case expands, it resizes. But the old memory is
not freed - that is why it is explicitly called a "monotonic allocator".

On Thu, Jun 11, 2009 at 8:18 AM, Scott McMurray <me22.ca+boost@gmail.com>wrote:
...
2009/6/10 Christian Schladetsch <christian.schladetsch@gmail.com>:
...
Cache coherency is King!
Well, for portability, cache obliviousness is King, since there's no
way you can tune for every specific combination of memory caches,
instruction caches, page caches, etc all at the same time.
( http://en.wikipedia.org/wiki/Cache-oblivious_algorithm )
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost