Re: [boost] [MultiIndex] Random Access mental model?

1 Nov 2008

      on Sat Nov 01 2008, "JOAQUIN M. LOPEZ MUÑOZ" <joaquin-AT-tid.es> wrote:
...
________________________________________
De: boost-bounces@lists.boost.org [boost-bounces@lists.boost.org] En nombre de David
Abrahams [dave@boostpro.com]
Enviado el: sábado, 01 de noviembre de 2008 0:47
Para: boost@lists.boost.org
Asunto: Re: [boost] [MultiIndex] Random Access mental model?
on Fri Oct 31 2008, Joaquin M Lopez Munoz <joaquin-AT-tid.es> wrote:
...
...
Yep, this is done. Only one node per element is allocated. This is
true regardless of the indices used, including random
access indices. If you want to visualize this, each element node
has as many node headers as indices are, each one stacked on top
of the next.
I think random access shouldn't need to add a node header unless you
want to do something like look up a key in the hash index and delete
that element.  Maybe a feature-model interface would be a more powerful
way to approach this space, so we don't have to pay for what we don't
use.
It is very difficult (I suspect impossible) to come up with a satisfactory global
picture without this up-pointer. Consider just the following scenario:
I have a multi_index_container<hashed,random_access> m and an
iterator to the hashed index it. m.erase(it) cannot be implemented
without the up-pointer in the random-access index. (If you don't
immediately realize why, picture yourself the hand-rolled composition
of a hashed_map and a deque<pointer>: you can't erase an element
from the combined structure given only a hash_map::iterator).
Yes, yes, I understood all that.  It's exactly what I was referring to
when I wrote "look up a key in the hash index and delete that element."
If you don't immediately see why that is the same thing you're talking
about, well, please tell me.
...
There are lots of additional problems,
I disagree with your premise.  It's only a problem if it's a problem for
me, and since I don't need that functionality I can use a lighter-weight
data structure with no up-pointers, just like I'll use a vector instead
of a deque when I don't need O(1) modification at the front.
...
but I think this is sufficiently
serious to justify my decision. Basically, Boost.MultiIndex assumes
O(1) inter-index iterator projection (getting from an iterator in
an index to an iterator in another index pointing to the same element,
in constant time).
That's fine; it's not the right library for my application, then.
...
[...]
...
...
The deque-based structure you propose would save, as you
say, roughly one pointer per node with respect to sequenced
indices and also with respect to random access indices,
which have the up-pointer your proposed data structure lacks,
while on the other hand it requires an extra pointer as the deque
stores pointers while multi_index_containers enjoy the header
stacking property referred to above.
So, roughly speaking, things are even.
Sorry, I don't see it.  Since I need O(1) pop-front I am using
sequenced<> which IIUC is a doubly-linked list.  Therefore your nodes
have this overhead:
next-in-list, prev-in-list, and next-in-hash-bucket = 3 pointers
and mine have only next-in-hash-bucket.  Add one pointer of overhead in
the deque and I am storing 2 pointers.
But your deque is a deque of *pointers*,
Uh, yes.  One pointer stored in the deque per element.
...
so the (amortized) penalty of the deque is 2 pointers per element.
Maybe you should explain why you think so.
...
Please take a look at the attached
figure, where the resulting data structures are depicted for:
* Boost.MultiIndex (hashed + sequenced),
  * Boost.MultiIndex (hashed + random access),
  * Hand-rolled composition (hash_map + deque).
That's not the data structure I was proposing.  The deque elements would
point directly to the elements in the hash_map, i.e.

      hash_map<key, value> + deque<std::pair<key const, value>*>

The picture you drew is something like 

      hash_map<key, value> + deque<std::pair<key const, value>**>

I can't imagine why you added that extra node.
...
(The array labeled "deque<pointer>" does not do merit to the actual
chunk-based structure used by most deque implementations, but you get
the idea, there's on average one pointer per element).
That would only be true if the block size of the deque was 1, but who
would build a deque that way?

I'm very familiar with deque.  There basically are two implementations,
one of which uses a circular buffer for the array of block pointers, but
fundamentally they have the same storage overhead.  If there are N
elements and a block can hold B elements the block array stores
cieling(N/B) pointers and there are at most N*floor(N/B) + 2*B *
sizeof(element) bytes of storage for blocks.  As N grows, the 2*B
constant becomes unimportant and you N have amortized N *sizeof(element)
bytes of element storage and (N/B)*sizeof(block*) bytes of block
storage.  When the element is a pointer that's (N+N/B)*sizeof(pointer).
The only way that can be equal to N*2*sizeof(pointer) is if B==1.
...
Do you see you've got four pointers (black dots) per element in the
three structures?
Yes, but the deque you picture is pathologically inefficient.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Re: [boost] [MultiIndex] Random Access mental model?

David Abrahams