
on Fri Oct 31 2008, Joaquin M Lopez Munoz <joaquin-AT-tid.es> wrote:
David Abrahams <dave <at> boostpro.com> writes:
Hi,
Hi Dave, allow me to address this post of yours first and defer the answer to the others you've send on Boost.MultiIndex, as those will need more elaboration.
Thanks. This is the most important one.
I'm trying to get a grip on what it means when I have a random access index in a multi_index_container. I have no mental model for what data structure you're actually creating, which makes it difficult to evaluate multi-index vs. hand-rolling my own container.
If I was doing this myself, I'd use a combination of these two:
hash_map<key, value> deque<hash_map<key, value>::pointer>
because I need to maintain an LRU cache of what has been put in the map (i.e. I need O(1) push_back and pop_front). MultiIndex only lets me use sequenced<> which it implies essentially has the properties of list<hash_map<key, value>::iterator>... although I can imagine you could save a lot by using the same nodes for the hash_map and the list. Do you do that?
Yep, this is done. Only one node per element is allocated. This is true regardless of the indices used, including random access indices. If you want to visualize this, each element node has as many node headers as indices are, each one stacked on top of the next.
I think random access shouldn't need to add a node header unless you want to do something like look up a key in the hash index and delete that element. Maybe a feature-model interface would be a more powerful way to approach this space, so we don't have to pay for what we don't use.
Some indices (namely, hashed and random access) use additional data structures external to the nodes themselves (this is obvious in the case of hashed indices, as they are entirely similar to unordered_set).
Yes, and I expect random access indices to do the same.
Regardless, I could still save roughly one pointer per node with my deque. I'd think about trying to modify MultiIndex to let it do random access with deques, but I can't make head or tail of the implementation code used for random access.
The data structure of a random access index is basically the same as depicted for a little container I call stable_vector:
That's what I figured... although the back-pointers are only needed if you need to delete and you can't get ahold of an iterator into the vector -- the same case I mentioned above.
So, each element is stored in its own node (as is been already said) and there is an additionally internal vector of pointers that keeps the elements together while providing O(1) access. The up-pointers described in the aforementioned article are necessary to provide stable iterators.
The overhead of a random access index is one pointer per node (the up-pointer) plus the size of the internal pointer vector. Some figures are calculated at:
http://bannalia.blogspot.com/2008/09/introducing-stablevector.html
The deque-based structure you propose would save, as you say, roughly one pointer per node with respect to sequenced indices and also with respect to random access indices, which have the up-pointer your proposed data structure lacks, while on the other hand it requires an extra pointer as the deque stores pointers while multi_index_containers enjoy the header stacking property referred to above.
So, roughly speaking, things are even.
Sorry, I don't see it. Since I need O(1) pop-front I am using sequenced<> which IIUC is a doubly-linked list. Therefore your nodes have this overhead: next-in-list, prev-in-list, and next-in-hash-bucket = 3 pointers and mine have only next-in-hash-bucket. Add one pointer of overhead in the deque and I am storing 2 pointers.
On the other hand, this up-pointer is essential to ensure iterator stability and iterator projection, which are features that currently the lib take for granted for all the different index types available.
In my application, the ability to erase nodes through hash iterators doesn't matter, but the per-node storage cost does.
Could someone clarify what's going on in there?
Hope the answer helped clarify the issue. Please come back otherwise.
Except for the fact that we are coming up with different numbers for the data structure overhead, yeah. Thanks, -- Dave Abrahams BoostPro Computing http://www.boostpro.com