[multi_index] hashed_index erase(iterator) performance issue

Hello, I have encountered a performance issue with multi_index's hashed_index erase(iterator) routine. The erase(iterator) call returns an iterator to the next element in the data structure. When using this routine in a sparsely populated, large sized hash, it takes a very long time to complete. For example, in my program, I have rehashed the hash table to 5 million buckets, and added 600,000 elements. Adding the elements takes a matter of seconds. Then, I delete the elements. Deleting the elements takes hours. I have profiled the code, and it looks to me like the performance hit is rooted in the increment of the iterator by the erase(iterator) call. Since the hash is sparsely populated, the increment must advance over many empty buckets for each erase(iterator) call, in order to advance to the next valid element. My question is, why does the erase(iterator) function return an iterator at all? (In comparison, sgi's stl::hash_map's erase() returns void.) The hash is unordered, so the returned iterator is a random value, which I think decreases its usefulness (I expect most clients will discard the returned iterator.) Additionally, the same behavior can be achieved by clients who desire this behavior if they post-increment their iterator argument themselves. That is: // After this line, cur_itr's initial element will be erased, // AND cur_itr will be updated to the next element in the hash table hash_table.erase(cur_itr++); Is there some way to remove elements from the hash without taking this iterator increment hit? Would it be possible to add an erase routine (of some other name) which accepts an iterator argument, but does not automatically increment the argument? Thanks, Brad

On Tue, Mar 16, 2010 at 8:46 AM, Brad Higgins <bhiggins@arbor.net> wrote:
Hello, I have encountered a performance issue with multi_index's hashed_index erase(iterator) routine. The erase(iterator) call returns an iterator to the next element in the data structure. When using this routine in a sparsely populated, large sized hash, it takes a very long time to complete. For example, in my program, I have rehashed the hash table to 5 million buckets, and added 600,000 elements. Adding the elements takes a matter of seconds. Then, I delete the elements. Deleting the elements takes hours. I have profiled the code, and it looks to me like the performance hit is rooted in the increment of the iterator by the erase(iterator) call. Since the hash is sparsely populated, the increment must advance over many empty buckets for each erase(iterator) call, in order to advance to the next valid element. My question is, why does the erase(iterator) function return an iterator at all? (In comparison, sgi's stl::hash_map's erase() returns void.) The hash is unordered, so the returned iterator is a random value, which I think decreases its usefulness (I expect most clients will discard the returned iterator.) Additionally, the same behavior can be achieved by clients who desire this behavior if they post-increment their iterator argument themselves. That is:
// After this line, cur_itr's initial element will be erased, // AND cur_itr will be updated to the next element in the hash table hash_table.erase(cur_itr++);
Is there some way to remove elements from the hash without taking this iterator increment hit? Would it be possible to add an erase routine (of some other name) which accepts an iterator argument, but does not automatically increment the argument?
We all know about the performance hit, however that style is mandated by the standard (which thankfully is still in flux, hopefully they will remove the need for it to return the iterator).

On Mar 16, 2010, at 5:56 PM, OvermindDL1 wrote:
We all know about the performance hit, however that style is mandated by the standard (which thankfully is still in flux, hopefully they will remove the need for it to return the iterator).
I researched it further yesterday, and found the thread about the identical issue with unordered_map (I need to read my boost mail more often!). I'm going to implement a void-return style version of erase, like the workaround in unordered_map, and use that until this issue is resolved officially. Thanks, Brad

Brad Higgins <bhiggins <at> arbor.net> writes:
On Mar 16, 2010, at 5:56 PM, OvermindDL1 wrote:
We all know about the performance hit, however that style is mandated by the standard (which thankfully is still in flux, hopefully they will remove the need for it to return the iterator).
I researched it further yesterday, and found the thread about the identical issue with unordered_map (I need to read my boost mail more often!). I'm going to implement a void-return style version of erase, like the workaround in unordered_map, and use that until this issue is resolved officially.
Hi Brad, I'm keeping track of the aforementioned thread as well as the discussions within the C++ committee. As soon as some consensus is reached I'll adapt Boost.MultiIndex accordingly. Thank you for using Boost.MultiIndex, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (3)
-
Brad Higgins
-
Joaquin M Lopez Munoz
-
OvermindDL1