Re: [boost] [multi_index][ann] Boost.MultiIndex 1.56 preview: major update with new hashed index implementation

26 Dec 2013

      El 26/12/2013 0:46, Joaquin M Lopez Munoz escribió:
...
The data structure I propose can do auto unlinking. Check
https://github.com/boostorg/multi_index/blob/master/
include/boost/multi_index/detail/hash_index_node.hpp , functions
unlink @ line 280 (non-duplicate elements) and unlink @ line 438
(duplicate elements): in both cases unlinking does not need any
additional information from the container.
Very clever. As you store pointers you don't need to distinguish between 
a node and a bucket and is_first_of_bucket/is_last_of_bucket can be 
implemented just checking a chain of pointers. Cool.
...
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579
was a NAD, so I had to comply, even though I take it as proven that
complying incurs an extra pointer per element that the original
proponents of unordered associative containers didn't see coming.
I agree. Quoting 
"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html"

"my goal is neither to require nor to forbid stored hash codes. I don't 
know of an implementation that currently stores hash codes, but I also 
don't know of anything in this proposal that would forbid it."

and

"Singly linked lists have slower iterators, because an iterator first 
steps within a bucket and then, upon reaching the end of a bucket, steps 
to the next"
...
...
- If the user is willing to pay the price, it can offer "group" linking
as Boost.Unordered does for duplicated elements but although this does
not solve the "depleted bucket" syndrome. Your proxy-returning erase()
Unfinished sentence?
Yes, sorry. I started thinking about your proxy-returning proposal and 
how it could be implemented. I think that could be a good solution for 
Boost.Intrusive using singly linked lists with no stored hash allowing the

while(it != itend)
   it = cont.erase(it)

pattern, which is what STL users are used to. But I guess conversions to 
const_iterator and related would be a problem for the proxy.
...
On the contrary, I think Boost.Intrusive can very easily implement
it as it has the auto-unlinking property. Drop me a line if you want
to port it to your lib.
Thanks! Currently boost::intrusive::hashtable code is very complicated 
(it needs to deal with "store_hash", "cache_begin", "power_2_buckets", 
"compare_hash" and "incremental" options) and adding another option to 
implement your data structure without breaking everything else would not 
be easy. I think it's better to have a separate implementation, even if 
that could lead to some duplicated code.

I even wanted to add some kind of highly customizable (for a default 
implementation Boost.Unordered is the way to go IMHO) experimental hash 
table to Boost.Container based on Boost.Intrusive hash tables and 
measure speed differences depending on different Boost.Intrusive options 
(store hash or not, etc.).
...
I think the variant for duplicate elements can be improved, as currently
group linking kicks in only for #group>2. I decided to leave it like
that for the moment but will get back to it when time permits.
Good.
...
Dinkumware's excellent performance (despite its heavy overhead of
2 pointers per node plus 2 pointers per bucket entry) comes from
to factors:
* Bucket arrays are powers of two,
* the data structure has very good locality,
But is has a serious problem with erasure, as it's basically violating
the standard by invoking the hash function, which can throw --and
erasure can't throw. The only fix would be adding an additional
node field to cache the hash value. As it stands now, Dinkumware's
implementation of standard unordered associative containers is broken.
Dinkumware used to implement incremental hash in old versions (it was 
the default in pre-unordered VC versions, and I think in VC10 it can be 
activated through a macro, but that disappeared in recent versons) which 
are also interesting to measure. No idea why they changed the 
implementation.

Just another comment in case you'd like to improve your benchmarks: I 
think "erase(iterator)" is much less used than "erase(key)", and in that 
case I don't think Dinkumware data structure would be better (maybe 
power of two bucket arrays can still help, but you need to traverse the 
linked list to find the element)

Best,

Ion

Re: [boost] [multi_index][ann] Boost.MultiIndex 1.56 preview: major update with new hashed index implementation

Ion Gaztañaga