Re: [boost] [MultiIndex] Random Access mental model?

31 Oct 2008

      on Fri Oct 31 2008, Joaquin M Lopez Munoz <joaquin-AT-tid.es> wrote:
...
David Abrahams <dave <at> boostpro.com> writes:
...
Hi,
Hi Dave, allow me to address this post of yours first and defer
the answer to the others you've send on Boost.MultiIndex, as
those will need more elaboration.
Thanks.  This is the most important one.
...
...
I'm trying to get a grip on what it means when I have a random access
index in a multi_index_container.  I have no mental model for what data
structure you're actually creating, which makes it difficult to evaluate
multi-index vs. hand-rolling my own container.
If I was doing this myself, I'd use a combination of these two:
hash_map<key, value>
   deque<hash_map<key, value>::pointer>
because I need to maintain an LRU cache of what has been put in the map
(i.e. I need O(1) push_back and pop_front). MultiIndex only lets me use
sequenced<> which it implies essentially has the properties of
list<hash_map<key, value>::iterator>... although I can imagine you could
save a lot by using the same nodes for the hash_map and the list.  Do
you do that?
Yep, this is done. Only one node per element is allocated. This is
true regardless of the indices used, including random
access indices. If you want to visualize this, each element node
has as many node headers as indices are, each one stacked on top
of the next.
I think random access shouldn't need to add a node header unless you
want to do something like look up a key in the hash index and delete
that element.  Maybe a feature-model interface would be a more powerful
way to approach this space, so we don't have to pay for what we don't
use.
...
Some indices (namely, hashed and random access) use additional data
structures external to the nodes themselves (this is obvious in the
case of hashed indices, as they are entirely similar to
unordered_set).
Yes, and I expect random access indices to do the same.
...
...
Regardless, I could still save roughly one pointer per node with my
deque.  I'd think about trying to modify MultiIndex to let it do random
access with deques, but I can't make head or tail of the implementation
code used for random access.
The data structure of a random access index is basically the
same as depicted for a little container I call stable_vector:
http://bannalia.blogspot.com/2008/08/stable-vectors.html
That's what I figured... although the back-pointers are only needed if
you need to delete and you can't get ahold of an iterator into the
vector -- the same case I mentioned above.
...
So, each element is stored in its own node (as is been already
said) and there is an additionally internal vector of pointers
that keeps the elements together while providing O(1)
access. The up-pointers described in the aforementioned article
are necessary to provide stable iterators.
The overhead of a random access index is one pointer per
node (the up-pointer) plus the size of the internal pointer
vector. Some figures are calculated at:
http://bannalia.blogspot.com/2008/09/introducing-stablevector.html
The deque-based structure you propose would save, as you
say, roughly one pointer per node with respect to sequenced
indices and also with respect to random access indices,
which have the up-pointer your proposed data structure lacks,
while on the other hand it requires an extra pointer as the deque
stores pointers while multi_index_containers enjoy the header
stacking property referred to above.
So, roughly speaking, things are even.
Sorry, I don't see it.  Since I need O(1) pop-front I am using
sequenced<> which IIUC is a doubly-linked list.  Therefore your nodes
have this overhead:

  next-in-list, prev-in-list, and next-in-hash-bucket = 3 pointers

and mine have only next-in-hash-bucket.  Add one pointer of overhead in
the deque and I am storing 2 pointers.
...
On the other hand, this up-pointer is essential to ensure
iterator stability and iterator projection, which are features
that currently the lib take for granted for all the different
index types available.
In my application, the ability to erase nodes through hash iterators
doesn't matter, but the per-node storage cost does.
...
...
Could someone clarify what's going on in there?
Hope the answer helped clarify the issue. Please come back otherwise.
Except for the fact that we are coming up with different numbers for
the data structure overhead, yeah.

Thanks,

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com