[unordered] equal_range is O(bucket_count()), not O(size())

TR1 specifies that unordered_multimap::equal_range(k) worst case complexity is O(size()). However, in practice, I find that the worst case performance of boost unordred_multimap's equal_range is O(bucket_count()). Is this known? I believe the issue is with the second iterator returned by equal_range(). Specifically, if there are sparse elements in the hash table, the implementation may have to traverse many buckets to find the second iterator. I have a test that demonstrates the issue. The test makes 2 runs over an unordered_multimap, each using the same number of elements. In the second run, it calls rehash() in order to grow the bucket count by a large amount. Notice that the first run is quite fast, while the second run takes more than 10 seconds. Here are the results from my test: ---- $ ./main rehash(0) insert 1000 elements built hash, with 1543 buckets and 1000 elements. Now, get some ranges time to obtain 1000 elements: 00:00:00.001366 rehash(5000000) insert 1000 elements built hash, with 6291469 buckets and 1000 elements. Now, get some ranges time to obtain 1000 elements: 00:00:10.818002 ---- Here is my code for the test: ---- #include <iostream> #include <boost/unordered_map.hpp> #include <boost/date_time/posix_time/posix_time.hpp> typedef boost::unordered_multimap<size_t, size_t> hash_int_type; typedef hash_int_type::iterator hash_int_itr; typedef std::pair<size_t, size_t> hash_insert_type; void run_test(const size_t data_size, const size_t rehash_size) { hash_int_type hash; std::cout << "rehash(" << rehash_size << ")\n"; hash.rehash(rehash_size); std::cout << "insert " << data_size << " elements\n"; for (size_t i=0; i<data_size; i++) { hash_insert_type ins(i,i); hash.insert(ins); } std::cout << "built hash, with " << hash.bucket_count() << " buckets and " << hash.size() << " elements. Now, get some ranges\n"; boost::posix_time::ptime start(boost::posix_time::microsec_clock::local_time()); for (size_t i=data_size-1; i!=0; i--) { typedef std::pair<hash_int_itr, hash_int_itr> ret_pair; ret_pair ret = hash.equal_range(i); if (ret.first != ret.second) { hash.quick_erase(ret.first); } } boost::posix_time::ptime stop(boost::posix_time::microsec_clock::local_time()); boost::posix_time::time_duration diff_time(stop-start); std::cout << "time to obtain " << data_size << " elements: " << diff_time << "\n"; } int main(int argc, char* argv[]) { run_test(1000,0); run_test(1000,5000000); return 0; } ----- Thanks, Brad

On 7 April 2011 16:52, Brad Higgins <bhiggins@arbor.net> wrote:
TR1 specifies that unordered_multimap::equal_range(k) worst case complexity is O(size()). However, in practice, I find that the worst case performance of boost unordred_multimap's equal_range is O(bucket_count()). Is this known?
There was a defect report about this: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3247.html#764 Also, erase has a similar problem: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3247.html#579 I think I'm going to have to change the data structure to something more efficient for these cases. The claim that it can be done without loss of efficiency seems a bit dubious to me, I'm pretty sure it'll require using more memory (i.e. an extra pointer or std::size_t per node or per bucket) which I've been discouraged from doing in the past.

On Apr 7, 2011, at 12:39 PM, Daniel James wrote:
There was a defect report about this:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3247.html#764
Excellent, thanks. I searched for reports on this, but couldn't find any. I'll follow the issue through the defect, and avoid equal_range in the mean time. Thanks, Brad

On 7 April 2011 17:39, Daniel James <dnljms@gmail.com> wrote:
I think I'm going to have to change the data structure to something more efficient for these cases. The claim that it can be done without loss of efficiency seems a bit dubious to me, I'm pretty sure it'll require using more memory (i.e. an extra pointer or std::size_t per node or per bucket) which I've been discouraged from doing in the past.
I've done this in the version in trunk, but it won't be released in the upcoming release. Will probably be in 1.48.
participants (2)
-
Brad Higgins
-
Daniel James