unordered_map memory footprint

newer
[GIL] Default constructing pixels??

older
[iostreams] newline_filter problem

Robert Dailey

3 Apr 2008 3 Apr '08

8:26 p.m.

Hey guys, Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page<http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a few questions concerning boost::unordered_map: 1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity? 2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map. Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

Robert Dailey

3 Apr 3 Apr

8:37 p.m.

On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey <rcdailey@gmail.com> wrote:

...

Hey guys,

Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page<http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a few questions concerning boost::unordered_map:

1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity? 2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.

Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.

Steven Watanabe

8:47 p.m.

AMDG Robert Dailey wrote:

...

On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey <rcdailey@gmail.com <mailto:rcdailey@gmail.com>> wrote:

Hey guys,

Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page <http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a few questions concerning boost::unordered_map:

1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?

lookup is O(n) if the hash function happens to return the same value for all the elements.

...

1.

2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.

unordered_map takes the user's hash function and reduces it mod the current number of buckets.

...

Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.

Does the following (very naive) example implementation give some idea of how a hash table works? const int num_buckets = 167; std::vector<int> buckets[numBuckets]; void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); } In Christ, Steven Watanabe

Robert Dailey

9:16 p.m.

On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe <watanabesj@gmail.com> wrote:

...

AMDG

Robert Dailey wrote:

...
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey <rcdailey@gmail.com <mailto:rcdailey@gmail.com>> wrote:

Hey guys,

Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page <http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a few questions concerning boost::unordered_map:

1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?

lookup is O(n) if the hash function happens to return the same value for all the elements.

...
1.

2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.

unordered_map takes the user's hash function and reduces it mod the current number of buckets.

...
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.

Does the following (very naive) example implementation give some idea of how a hash table works?

const int num_buckets = 167; std::vector<int> buckets[numBuckets];

void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }

In Christ, Steven Watanabe

According to your example, what happens if I do: insert(167); insert(334); ???

Robert Dailey

9:18 p.m.

On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey <rcdailey@gmail.com> wrote:

...

On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe <watanabesj@gmail.com> wrote:

...
AMDG

Robert Dailey wrote:

...
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey <rcdailey@gmail.com <mailto:rcdailey@gmail.com>> wrote:

Hey guys,

Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page <http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a few questions concerning boost::unordered_map:

1. From what I've read, the performance of a hash table is O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?

lookup is O(n) if the hash function happens to return the same value for all the elements.

...
1.

2. What is the memory footprint of an unordered_map? From my research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.

unordered_map takes the user's hash function and reduces it mod the current number of buckets.

...
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.

Does the following (very naive) example implementation give some idea of how a hash table works?

const int num_buckets = 167; std::vector<int> buckets[numBuckets];

void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }

In Christ, Steven Watanabe

According to your example, what happens if I do:

insert(167); insert(334);

???

The above is assuming that the hash() function does: hash(x) { return x; } insert(167); insert(334); the above code (second insert) would cause the same element in the vector to be overwritten. The math is a little bit odd too. It doesn't really answer a lot of questions. The inserted values may also be strings as well, however in your example it appears as if the hash function only handles integral values.

Bill Somerville

10:16 p.m.

Robert Dailey wrote:

...

On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey <rcdailey@gmail.com <mailto:rcdailey@gmail.com>> wrote:

On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe <watanabesj@gmail.com <mailto:watanabesj@gmail.com>> wrote:

[...snip...]

...

> Also, the definition of 'bucket' eludes me. If someone could explain > what buckets are in relation to unordered associative containers I'd > appreciate it.

Buckets containers for the elements with equal hash values.

...

Does the following (very naive) example implementation give some idea of how a hash table works?

const int num_buckets = 167; std::vector<int> buckets[numBuckets];

void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }

Note that insert doesn't overwrite bucket - it adds the new element to the bucket which is a vector.

...

In Christ, Steven Watanabe

According to your example, what happens if I do:

insert(167); insert(334);

???

The above is assuming that the hash() function does:

hash(x) { return x; }

The hash function maps an element key with a number, all equal keys must produce the same number, hash functions are a many to one mapping usually.

...

insert(167); insert(334);

the above code (second insert) would cause the same element in the vector to be overwritten. The math is a little bit odd too. It doesn't really answer a lot of questions. The inserted values may also be strings as well, however in your example it appears as if the hash function only handles integral values.

Elements are distributed across the buckets, if more than one element maps to the same bucket then that is termed a collision. If the hash function is perfect then each bucket will contain 1 element when the hash map is full and lookup is O(1), too many elements (or too few buckets) or an imperfect hash function cause multiple elements in one or more buckets; tending to O(n) lookup (like a vector) in the worst case where all elements are in one bucket. Choose a combination of hash function and number of buckets that gives the required performance. HTH -- Bill Somerville Class Design Limited

Ovanes Markarian

9:24 p.m.

Steven, I am probably not the first one who says this: You should write a C++ Book! ;) Your explanations are crystal clear and directly to the point! Complements! Sorry for small offtopic. Ovanes. On 4/3/08, Steven Watanabe <watanabesj@gmail.com> wrote:

...

AMDG

Robert Dailey wrote:

...
On Thu, Apr 3, 2008 at 3:26 PM, Robert Dailey <rcdailey@gmail.com

...
<mailto:rcdailey@gmail.com>> wrote:

Hey guys,

Hash tables are new to me since they have never been part of the C++ standard template library, and having started C++ as my first language I have never been exposed to them. Ever since unordered_map appeared in the Boost library, I've been trying to understand how they function in terms of functionality, memory, and performance. A major source of information has been from Wikipedia, specifically this page

...
<http://en.wikipedia.org/wiki/Hash_table>. Below I have outlined a

...
few questions concerning boost::unordered_map:

...
1. From what I've read, the performance of a hash table is

...
O(1), however it mentions that depending on specific situations it may be O(n). Is this true with unordered map? If so, what conditions would force it to maintain O(n) complexity?

lookup is O(n) if the hash function happens to return the same value for all the elements.

...
1.

2. What is the memory footprint of an unordered_map? From my

...
research, hash tables seem to be a simple wrapper over an array in that the size of the "internal array" in the hash table will contain as many elements as the largest hash value generated by one of the key elements. So if I put two elements in my hash table and the generated keys are 200 and then 1,000,000, then the hash table would be 1,000,000 elements in size (which is at least a megabyte of memory in the most optimistic scenario). This logic is rather ridiculous and impractical, so I am pretty sure I'm misunderstanding how the memory mechanics work under the hood for an unordered_map.

unordered_map takes the user's hash function and reduces it mod the current number of buckets.

...
Those are the biggest questions I have now and if I think of more I'll most certainly be providing follow-up posts. I thank everyone in advance for reading and helping.

Also, the definition of 'bucket' eludes me. If someone could explain what buckets are in relation to unordered associative containers I'd appreciate it.

Does the following (very naive) example implementation give some idea of how a hash table works?

const int num_buckets = 167; std::vector<int> buckets[numBuckets];

void insert(int x) { buckets[hash(x) % numBuckets].push_back(x); }

In Christ, Steven Watanabe

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

6352

Age (days ago)

6352

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Bill Somerville
Ovanes Markarian
Robert Dailey
Steven Watanabe