El 26/12/2013 0:46, Joaquin M Lopez Munoz escribió:
The data structure I propose can do auto unlinking. Check https://github.com/boostorg/multi_index/blob/master/ include/boost/multi_index/detail/hash_index_node.hpp , functions unlink @ line 280 (non-duplicate elements) and unlink @ line 438 (duplicate elements): in both cases unlinking does not need any additional information from the container.
Very clever. As you store pointers you don't need to distinguish between a node and a bucket and is_first_of_bucket/is_last_of_bucket can be implemented just checking a chain of pointers. Cool.
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579
was a NAD, so I had to comply, even though I take it as proven that complying incurs an extra pointer per element that the original proponents of unordered associative containers didn't see coming.
I agree. Quoting "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html" "my goal is neither to require nor to forbid stored hash codes. I don't know of an implementation that currently stores hash codes, but I also don't know of anything in this proposal that would forbid it." and "Singly linked lists have slower iterators, because an iterator first steps within a bucket and then, upon reaching the end of a bucket, steps to the next"
- If the user is willing to pay the price, it can offer "group" linking as Boost.Unordered does for duplicated elements but although this does not solve the "depleted bucket" syndrome. Your proxy-returning erase()
Unfinished sentence?
Yes, sorry. I started thinking about your proxy-returning proposal and how it could be implemented. I think that could be a good solution for Boost.Intrusive using singly linked lists with no stored hash allowing the while(it != itend) it = cont.erase(it) pattern, which is what STL users are used to. But I guess conversions to const_iterator and related would be a problem for the proxy.
On the contrary, I think Boost.Intrusive can very easily implement it as it has the auto-unlinking property. Drop me a line if you want to port it to your lib.
Thanks! Currently boost::intrusive::hashtable code is very complicated (it needs to deal with "store_hash", "cache_begin", "power_2_buckets", "compare_hash" and "incremental" options) and adding another option to implement your data structure without breaking everything else would not be easy. I think it's better to have a separate implementation, even if that could lead to some duplicated code. I even wanted to add some kind of highly customizable (for a default implementation Boost.Unordered is the way to go IMHO) experimental hash table to Boost.Container based on Boost.Intrusive hash tables and measure speed differences depending on different Boost.Intrusive options (store hash or not, etc.).
I think the variant for duplicate elements can be improved, as currently group linking kicks in only for #group>2. I decided to leave it like that for the moment but will get back to it when time permits.
Good.
Dinkumware's excellent performance (despite its heavy overhead of 2 pointers per node plus 2 pointers per bucket entry) comes from to factors:
* Bucket arrays are powers of two, * the data structure has very good locality,
But is has a serious problem with erasure, as it's basically violating the standard by invoking the hash function, which can throw --and erasure can't throw. The only fix would be adding an additional node field to cache the hash value. As it stands now, Dinkumware's implementation of standard unordered associative containers is broken.
Dinkumware used to implement incremental hash in old versions (it was the default in pre-unordered VC versions, and I think in VC10 it can be activated through a macro, but that disappeared in recent versons) which are also interesting to measure. No idea why they changed the implementation. Just another comment in case you'd like to improve your benchmarks: I think "erase(iterator)" is much less used than "erase(key)", and in that case I don't think Dinkumware data structure would be better (maybe power of two bucket arrays can still help, but you need to traverse the linked list to find the element) Best, Ion