
Hi David, Christian> [...] the same range of memory can be
used by multiple containers, the range of memory can be placed on the stack *or* the heap, and by making de-allocation a no-op, memory management is optimally efficient in time but always increases in space.
David> This is the key unique aspect of your allocator, so what I think you need to do is that this specific aspect can bring a higher performance. So create a sample with at least two containers, preferably one vector-like and the other a tree, using the same block.
I have done this. The results show a %50 improvement. I posted the code three posts ago on this thread. It exists in the sandbox at https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/main.c..., starting on line 419 "void test_map_list_realtime()".
That way you can sell this idea, which I think is rather unique: of having a small contiguous segment with cheap internal allocation for *two or more containers, and not restricted to vector-like containers*.
After I posted my initial map<int, list<int> > implementation using the proposed system, and got some good results (if tainted somewhat by my blunder on the 100x claim), Scott McMurray wrote a custom intrusive container just for this one case, and compared the results:
Scott: Upping it to 0x1010101 gave this:
mono: 35.82/35.71/35.82 std: 37.94/37.83/37.81 intrusive: 36.17/35.78/35.62 So, using a monotonic allocator with the stock standard containers is at least comparable if not better than even a hand-built intrusive container. Why? One thing to consider as well about the monotonic allocator is that it is also more efficient in space. This is because there is no overhead for keeping track of freelists or other memory-management issues. Specifically, say you have a map<int, int> and want to insert two nodes. Using MSVC's <xtree> for reference: struct Node { _Nodeptr _Left; // left subtree, or smallest element if head _Nodeptr _Parent; // parent, or root of tree if head _Nodeptr _Right; // right subtree, or largest element if head value_type _Myval; // the stored value, unused if head char _Color; // _Red or _Black, _Black if head char _Isnil; // true only if head (also nil) node }; This is 3*4 + 4 + 2 = 18 bytes per node on a typical 32-bit system. I assume a typical 32-bit system in the rest of this post. There will be vagaries between platforms and architectures, but even so that is incidental to the underlying point I will try to make. Let's have a quick look at what happens when you put one of these on the heap with new. There will be whatever alignment padding is required, plus whatever overhead your memory allocator uses. We can't say too much about what a generalised allocator requires, but at a guess I'd say: two extra pointers to insert into a linked list = +8 bytes a block size, and a used size = +8 bytes In a real system there would likely be more, but let's just say that it's around 16 bytes. Allocating for two new nodes here is (18 + 16)*2 = 68 bytes. Compare to using a monotonic allocator. Two new nodes use 36 bytes. Three use 54 bytes, which is still less than the space required for just two nodes using new. I have intentionally ignored alignment for both cases for the moment, I'll get to that in a moment. Now, compare to a list node. In this case, the ratio of overhead to value is higher still. A list node for ints uses 12 bytes - which is smaller than the allocation overhead! So you lose every time you allocate. On a typical 32-bit system: new allocating 4 nodes for list<int>: 4*(16 + 12) = 112 bytes mono allocating 4 nodes for list<int>: 4*(12) = 48 bytes. Plus, and this is very important for speed, the 4 nodes in the monotonic storage are exactly adjacent in memory. On GCC, which uses 4-byte alignment on intel platforms, there is not even any padding between the nodes. They will reside precisely adjacent in memory, and possibly even right there on the stack. I am guessing about the overhead for each allocation made by new. If it is 16 or 32 or 8, it is still higher than the overhead for a monotonic allocator, which is exactly zero. Of course, the ratio of waste goes down for larger values stored in the nodes, but whatever that ratio is, it will be smallest for a monotonic allocator which has exactly zero overhead. Yes, there will be extra required for alignment, but this is required for any allocator. Thanks again to Arytom and Thorsten for their perserverance agaisnt my ignorance on this matter. So, this is why a tree of lists of vectors is very fast and very small using monotonic allocators. It is why using a monotonic allocator for the standard containers produces code that is comparable to or even faster than hand-crafted intrusive containers. This, combined with the fact that the nodes themselves can be placed on the stack, and pointers in those nodes are also pointers to other nodes that are also on the stack, and quite likely very close on the stack as well. So the offsets are small, the entire data-structure itself is very small, there is no overhead other than the required alignment, and it is far easier to make efficient datastructures. As David and others have pointed out, any benefit given by this approach becomes impractical to leverage or swamped by other issues for very large structures. Even then however, if an upper limit is known, the storage can be put on the heap and not the stack and you can still get the benefit of zero allocation overhead and zero heap fragmentation. It is not for everyone, and it's certainly not a general solution. The approach here was born from the need for small, fast, local, short-lived, but general and nested containers, and it works best in that context.
/David
Regards, Christian.