unordered_map memory footprint

Hey guys, Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this pagehttp://en.wikipedia.org/wiki/Hash_table. Below I have outlined a few questions concerning boost::unordered_map: 1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity? 2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map. Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey
Hey guys,
Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this pagehttp://en.wikipedia.org/wiki/Hash_table. Below I have outlined a few questions concerning boost::unordered_map:
1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity? 2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.
Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.

AMDG Robert Dailey wrote:
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey
mailto:rcdailey@gmail.com> wrote: Hey guys,
Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page http://en.wikipedia.org/wiki/Hash_table. Below I have outlined a few questions concerning boost::unordered_map:
1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?
lookup is O(n) if the hash function happens to return the same value for all the elements.
1.
2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.
unordered_map takes the user's hash function and reduces it mod the current number of buckets.
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.
Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.
Does the following (very naive) example implementation give some idea of how a hash table works? const int num_buckets = 167; std::vector<int> buckets[numBuckets]; void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); } In Christ, Steven Watanabe

On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe
AMDG
Robert Dailey wrote:
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey
mailto:rcdailey@gmail.com> wrote: Hey guys,
Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page http://en.wikipedia.org/wiki/Hash_table. Below I have outlined a few questions concerning boost::unordered_map:
1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?
lookup is O(n) if the hash function happens to return the same value for all the elements.
1.
2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.
unordered_map takes the user's hash function and reduces it mod the current number of buckets.
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.
Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.
Does the following (very naive) example implementation give some idea of how a hash table works?
const int num_buckets = 167; std::vector<int> buckets[numBuckets];
void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }
In Christ, Steven Watanabe
According to your example, what happens if I do: insert(167); insert(334); ???

On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey
On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe
wrote: AMDG
Robert Dailey wrote:
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey
mailto:rcdailey@gmail.com> wrote: Hey guys,
Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page http://en.wikipedia.org/wiki/Hash_table. Below I have outlined a few questions concerning boost::unordered_map:
1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?
lookup is O(n) if the hash function happens to return the same value for all the elements.
1.
2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.
unordered_map takes the user's hash function and reduces it mod the current number of buckets.
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.
Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.
Does the following (very naive) example implementation give some idea of how a hash table works?
const int num_buckets = 167; std::vector<int> buckets[numBuckets];
void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }
In Christ, Steven Watanabe
According to your example, what happens if I do:
insert(167); insert(334);
???
The above is assuming that the hash() function does: hash(x) { return x; } insert(167); insert(334); the above code (second insert) would cause the same element in the vector to be overwritten. The math is a little bit odd too. It doesn't really answer a lot of questions. The inserted values may also be strings as well, however in your example it appears as if the hash function only handles integral values.

Robert Dailey wrote:
On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey
mailto:rcdailey@gmail.com> wrote: On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe
mailto:watanabesj@gmail.com> wrote:
[...snip...]
> Also, the definition of 'bucket' eludes me. If someone could explain > what buckets are in relation to unordered associative containers I'd > appreciate it.
Buckets containers for the elements with equal hash values.
Does the following (very naive) example implementation give some idea of how a hash table works?
const int num_buckets = 167; std::vector<int> buckets[numBuckets];
void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }
Note that insert doesn't overwrite bucket - it adds the new element to the bucket which is a vector.
In Christ, Steven Watanabe
According to your example, what happens if I do:
insert(167); insert(334);
???
The above is assuming that the hash() function does:
hash(x) { return x; }
The hash function maps an element key with a number, all equal keys must produce the same number, hash functions are a many to one mapping usually.
insert(167); insert(334);
the above code (second insert) would cause the same element in the vector to be overwritten. The math is a little bit odd too. It doesn't really answer a lot of questions. The inserted values may also be strings as well, however in your example it appears as if the hash function only handles integral values.
Elements are distributed across the buckets, if more than one element maps to the same bucket then that is termed a collision. If the hash function is perfect then each bucket will contain 1 element when the hash map is full and lookup is O(1), too many elements (or too few buckets) or an imperfect hash function cause multiple elements in one or more buckets; tending to O(n) lookup (like a vector) in the worst case where all elements are in one bucket. Choose a combination of hash function and number of buckets that gives the required performance. HTH -- Bill Somerville Class Design Limited

Steven,
I am probably not the first one who says this: You should write a C++ Book!
;)
Your explanations are crystal clear and directly to the point! Complements!
Sorry for small offtopic.
Ovanes.
On 4/3/08, Steven Watanabe
AMDG
Robert Dailey wrote:
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey
mailto:rcdailey@gmail.com> wrote:
Hey guys,
Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page
http://en.wikipedia.org/wiki/Hash_table. Below I have outlined a
few questions concerning boost::unordered_map:
1. From what I've read, the performance of a hash table is
O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?
lookup is O(n) if the hash function happens to return the same value for all the elements.
1.
2. What is the memory footprint of an unordered_map? From my
research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.
unordered_map takes the user's hash function and reduces it mod the current number of buckets.
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.
Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.
Does the following (very naive) example implementation give some idea of how a hash table works?
const int num_buckets = 167; std::vector<int> buckets[numBuckets];
void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }
In Christ, Steven Watanabe
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (4)
-
Bill Somerville
-
Ovanes Markarian
-
Robert Dailey
-
Steven Watanabe