New subject: [unordered] equal_range is O(bucket_count()), not O(size())

7 Apr 2011

      TR1 specifies that unordered_multimap::equal_range(k) worst case complexity is O(size()).  However, in practice, I find that the worst case performance of boost unordred_multimap's equal_range is O(bucket_count()).  Is this known?

I believe the issue is with the second iterator returned by equal_range().  Specifically, if there are sparse elements in the hash table, the implementation may have to traverse many buckets to find the second iterator.

I have a test that demonstrates the issue.  The test makes 2 runs over an unordered_multimap, each using the same number of elements.  In the second run, it calls rehash() in order to grow the bucket count by a large amount.  Notice that the first run is quite fast, while the second run takes more than 10 seconds.  Here are the results from my test:

----
$ ./main
rehash(0)
insert 1000 elements
built hash, with 1543 buckets and 1000 elements.  Now, get some ranges
time to obtain 1000 elements: 00:00:00.001366
rehash(5000000)
insert 1000 elements
built hash, with 6291469 buckets and 1000 elements.  Now, get some ranges
time to obtain 1000 elements: 00:00:10.818002
----

Here is my code for the test:
----
#include <iostream>
#include <boost/unordered_map.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>

typedef boost::unordered_multimap<size_t, size_t> hash_int_type;
typedef hash_int_type::iterator hash_int_itr;
typedef std::pair<size_t, size_t> hash_insert_type;

void
run_test(const size_t data_size,
         const size_t rehash_size)
{
    hash_int_type hash;
    std::cout << "rehash(" << rehash_size << ")\n";
    hash.rehash(rehash_size);
    std::cout << "insert " << data_size << " elements\n";
    for (size_t i=0; i<data_size; i++) {
        hash_insert_type ins(i,i);
        hash.insert(ins);
    }
    std::cout << "built hash, with "
        << hash.bucket_count()
        << " buckets and "
        << hash.size()
        << " elements.  Now, get some ranges\n";
    boost::posix_time::ptime start(boost::posix_time::microsec_clock::local_time());
    for (size_t i=data_size-1; i!=0; i--) {
        typedef std::pair<hash_int_itr, hash_int_itr> ret_pair;
        ret_pair ret = hash.equal_range(i);
        if (ret.first != ret.second) {
            hash.quick_erase(ret.first);
        }
    }
    boost::posix_time::ptime stop(boost::posix_time::microsec_clock::local_time());
    boost::posix_time::time_duration diff_time(stop-start);

    std::cout << "time to obtain " << data_size << " elements: " << diff_time << "\n";

}

int
main(int argc, char* argv[])
{
    run_test(1000,0);
    run_test(1000,5000000);
    return 0;
}
-----

Thanks,
Brad

[unordered] equal_range is O(bucket_count()), not O(size())

Brad Higgins

Daniel James

Brad Higgins

Daniel James

tags

participants (2)