[multi-index] Changing hash_index_node.hpp to double linked list

I am using boost multi-index, with a hashed-non-unique lookup. In my data, I am expecting many hash collisions. When I erase the multi- index, I find very poor performance. On investigation, I found that the hash collision data structure is a single linked list, and that removing an element involves traversing the entire list. Removing all of the elements involves traversing the list many times. Since my data has many collisions, I am traversing a very long list very many times. I know that the hash table is not designed for use with many collisions, but it is still faster in my testing on insert and serialization than the ordered_non_unique index specifier. I may end up using ordered_non_unique to avoid the worst-case erase() performance, and it may be good enough for the insert() performance. However, I would like to investigate making the hash collisions double- linked. I tried editing boost/multi_index/detail/hash_index_node.hpp, to make it a double-linked list, but I am seeing one of the included tests fail (test_safe_mode). I can't spot any bugs in my hash_index_node.hpp changes. What, apart from hash_index_node.hpp, needs to be updated, to support a double- linked list? boost/multi_index/detail/bucket_array.hpp? Others? Thanks, Brad

brad higgins <bhiggins <at> arbor.net> writes:
I am using boost multi-index, with a hashed-non-unique lookup. In my data, I am expecting many hash collisions. When I erase the multi- index, I find very poor performance. On investigation, I found that the hash collision data structure is a single linked list, and that removing an element involves traversing the entire list. Removing all of the elements involves traversing the list many times. Since my data has many collisions, I am traversing a very long list very many times.
Hi Brad, Yep, that's a shortcoming of the singly linked approach in the non_unique case. I decided to keep it like that for simplicity (code is shared with the unique variant) and because the case of having many duplicates is definitely not a normal scenario for hashed containers. [...]
I tried editing boost/multi_index/detail/hash_index_node.hpp, to make it a double-linked list, but I am seeing one of the included tests fail (test_safe_mode).
Do you mean you're passing the rest of the tests? Wow, that's a feat taking into account that the internal structure of the code is not documented.
I can't spot any bugs in my hash_index_node.hpp changes. What, apart from hash_index_node.hpp, needs to be updated, to support a double- linked list? boost/multi_index/detail/bucket_array.hpp? Others?
Off the top of my head I couldn't say whether you need to change bucket_array, I'd say it depends on the way you're implementing the doubly linked stuff. If you send me the modified files (eiher privately or through the list) I can take a look and report back. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Hi Joaquín, Thanks for the quick response. I have attached the files I edited. Thank you very much for the help, Brad

brad higgins escribió:
Hi Joaquín, Thanks for the quick response. I have attached the files I edited.
Thank you very much for the help, Brad
Hi Brad, There were one more change to do in boost/multi_index/hashed_index.hpp, please find attached a .diff file. I also updated the invariant_() member function to check the new doubly linked internal structure. Moreover, your modified hashed_index retains forward iterators from the original, but I guess you can trivially implement them as bidirectional (clue: see how this is done at sequenced_index.hpp). Joaquín M López Muñoz Telefónica, Investigación y Desarrollo Index: hashed_index.hpp =================================================================== --- hashed_index.hpp (revision 52074) +++ hashed_index.hpp (working copy) @@ -679,10 +679,12 @@ map.find( static_cast<final_node_type*>( node_type::from_impl(next_org))))->impl(); + cpy->next()->prev()=cpy; next_org=next_org->next(); cpy=cpy->next(); } cpy->next()=begin_cpy; + begin_cpy->prev()=cpy; } super::copy_(x,map); @@ -938,7 +940,12 @@ } else{ size_type s0=0; - for(const_iterator it=begin(),it_end=end();it!=it_end;++it,++s0){} + for(const_iterator it=begin(),it_end=end();it!=it_end;++it,++s0){ + if(it.get_node()->impl()->next()->prev()!=it.get_node()->impl()) + return false; + if(it.get_node()->impl()->prev()->next()!=it.get_node()->impl()) + return false; + } if(s0!=size())return false; size_type s1=0;
participants (3)
-
brad higgins
-
Joaquin M Lopez Munoz
-
joaquin@tid.es