Ion Gaztañaga
El 25/12/2013 18:12, Joaquín Mª López Muñoz escribió:
The most important change is a complete reimplementation of the internals of hashed indices, so that erasure of elements is constant-time regardless of the level of occupancy of the internal bucket array [...]
Joaquín, you blog posts are incredibly interesting. Congratulations.
Thank you!
Just in case you want to check it in the future, Boost.Intrusive uses a policy-based data structure, but suffers from the same performance degradation as MultiIndex had in pre-1.56 versions. AFAIK, the good old bucket rooted singly linked list is the only one that can support "auto-unlink" hooks (nodes that delete themselves from the container without having a pointer to the container).
The data structure I propose can do auto unlinking. Check https://github.com/boostorg/multi_index/blob/master/ include/boost/multi_index/detail/hash_index_node.hpp , functions unlink @ line 280 (non-duplicate elements) and unlink @ line 438 (duplicate elements): in both cases unlinking does not need any additional information from the container.
- It does avoid the "iterator erase(...)" problem returning "void" instead of "iterator" (as Intrusive need not to be standard-conforming).
That would be my preferred soluton at any rate but unfortunately the committee decided that LWG issue #579 http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579 was a NAD, so I had to comply, even though I take it as proven that complying incurs an extra pointer per element that the original proponents of unordered associative containers didn't see coming.
- If the user is willing to pay the price, it can offer "group" linking as Boost.Unordered does for duplicated elements but although this does not solve the "depleted bucket" syndrome. Your proxy-returning erase()
Unfinished sentence?
- It can also store the hash value in the node and this is used for faster rehashing and lookups but it does not implement the libstdc++/libc++/Boost.Unordered data structure. This was planned for the future to improve erasure times.
- Power of two buckets can be optionally enabled to speed up some operations if the user is sure the hash function is good enough.
You novel data structure is really interesting, although I guess it's not easy to implement, specially when duplicates are around.
On the contrary, I think Boost.Intrusive can very easily implement it as it has the auto-unlinking property. Drop me a line if you want to port it to your lib. I think the variant for duplicate elements can be improved, as currently group linking kicks in only for #group>2. I decided to leave it like that for the moment but will get back to it when time permits.
It's interesting to see how well the Dinkumware implementation performs, as the memory cost is now lower than some years ago, now that the rest of implementations have added the hash value in the node (the overhead of the Dinkumware implementation is one pointer per bucket).
Dinkumware's excellent performance (despite its heavy overhead of 2 pointers per node plus 2 pointers per bucket entry) comes from to factors: * Bucket arrays are powers of two, * the data structure has very good locality, But is has a serious problem with erasure, as it's basically violating the standard by invoking the hash function, which can throw --and erasure can't throw. The only fix would be adding an additional node field to cache the hash value. As it stands now, Dinkumware's implementation of standard unordered associative containers is broken. Best, Joaquín M López Muñoz Telefónica Digital