hash_combine vs hash_value

Michael Goldshteyn

25 Mar 2011 25 Mar '11

1:57 p.m.

After reading the explanation of how hash_combine should be used, I am left wondering. Why is the recommendation for hash_combine different from what one would intuitively expect? Namely, given: struct MyStruct { int a; int b; int c; }; The recommendation via examples and documentation is: size_t hash(const MyStruct &s) { size_t hashValue=0; // Combine hashes of all elements hash_combine(hashValue,s.a); hash_combine(hashValue,s.b); hash_combine(hashValue,s.c); return hashValue; } Rather than what I would think is the expected correct approach: size_t hash(const MyStruct &s) { // For a composite hash, start by hashing the first element size_t hashValue=hash_value(s.a); // Then, combine with hashes of the remaining elements hash_combine(hashValue,s.b); hash_combine(hashValue,s.c); return hashValue; } Thanks, Michael Goldshteyn

Show replies by date

Steven Watanabe

25 Mar 25 Mar

2:47 p.m.

AMDG On 03/25/2011 06:57 AM, Michael Goldshteyn wrote:

...

After reading the explanation of how hash_combine should be used, I am left wondering. Why is the recommendation for hash_combine different from what one would intuitively expect?

Either one is okay.

...

Namely, given:

struct MyStruct { int a; int b; int c; };

The recommendation via examples and documentation is:

size_t hash(const MyStruct &s) { size_t hashValue=0;

// Combine hashes of all elements hash_combine(hashValue,s.a); hash_combine(hashValue,s.b); hash_combine(hashValue,s.c);

return hashValue; }

Rather than what I would think is the expected correct approach:

size_t hash(const MyStruct &s) { // For a composite hash, start by hashing the first element size_t hashValue=hash_value(s.a);

// Then, combine with hashes of the remaining elements hash_combine(hashValue,s.b); hash_combine(hashValue,s.c);

return hashValue; }

In Christ, Steven Watanabe

Daniel James

2:49 p.m.

On 25 March 2011 13:57, Michael Goldshteyn <mgoldshteyn@comcast.net> wrote:

...

After reading the explanation of how hash_combine should be used, I am left wondering. Why is the recommendation for hash_combine different from what one would intuitively expect?

If you call hash_value directly you won't be taking advantage of the mechanism for finding the correct overload. Obviously, in this case that isn't a big deal but we'd rather encourage an implementation style that always works. There are also some portability workarounds that you'll be missing. Daniel

Michael Goldshteyn

3:32 p.m.

The thing that bothers me about the use "hash_combine" for even the first hashed value of a bunch of values is the fact that: size_t myHashVal=0; boost::hash_combine(myHashVal,firstVal); ... becomes size_t seed=0; seed ^= hash_value(myFirstHashVal) + 0x9e3779b9; // + (seed << 6) + (seed >> 2) does nothing for seed==0 ... The xor'ing of the first value doesn't bother me, but this displacement by 0x9e3779b9 of the first value does, since for 32-bit size_t values, we've just lost a bunch of info. It seems like the formula will result in less of the first value being applied to the overall hash, due to the "potential" unsigned 32-bit overflow caused by the addition. Maybe I'm misinterpreting the effect. If so, please correct me. Thanks, Michael Goldshteyn

Steven Watanabe

3:42 p.m.

AMDG On 03/25/2011 08:32 AM, Michael Goldshteyn wrote:

...

The xor'ing of the first value doesn't bother me, but this displacement by 0x9e3779b9 of the first value does, since for 32-bit size_t values, we've just lost a bunch of info. It seems like the formula will result in less of the first value being applied to the overall hash, due to the "potential" unsigned 32-bit overflow caused by the addition. Maybe I'm misinterpreting the effect. If so, please correct me.

You won't lose anything. Unsigned addition wraps around. In Christ, Steven Watanabe

Topher Cooper

6:01 p.m.

Your method implies a special status to the first element, which it should not have. Of course, because ordering makes a difference the first element is special because it is the first, but the second element is special because it is the second, and so on. Every element should be treated uniformly. Code should always be written, to the extent possible, to reflect concept and intent to the greatest extent possible. Is it really conceptually necessary that any combining algorithm should be just as good if the hash of the initial value can be treated as if it were the combination of a sequence of values? (My own preference would have been for a combiner that was a struct which would be initialized, called on each value to be included and then finalized when the final hash value is extracted -- a more general interface which could be implemented to do precisely what this one does. But I didn't speak up when this was being discussed, so I don't have anything to complain about). Topher On 3/25/2011 9:57 AM, Michael Goldshteyn wrote:

...

After reading the explanation of how hash_combine should be used, I am left wondering. Why is the recommendation for hash_combine different from what one would intuitively expect?

Namely, given:

struct MyStruct { int a; int b; int c; };

The recommendation via examples and documentation is:

size_t hash(const MyStruct &s) { size_t hashValue=0;

// Combine hashes of all elements hash_combine(hashValue,s.a); hash_combine(hashValue,s.b); hash_combine(hashValue,s.c);

return hashValue; }

Rather than what I would think is the expected correct approach:

size_t hash(const MyStruct &s) { // For a composite hash, start by hashing the first element size_t hashValue=hash_value(s.a);

// Then, combine with hashes of the remaining elements hash_combine(hashValue,s.b); hash_combine(hashValue,s.c);

return hashValue; }

Thanks,

Michael Goldshteyn

Scott McMurray

26 Mar 26 Mar

2:26 a.m.

On Fri, Mar 25, 2011 at 06:57, Michael Goldshteyn <mgoldshteyn@comcast.net> wrote:

...

After reading the explanation of how hash_combine should be used, I am left wondering. Why is the recommendation for hash_combine different from what one would intuitively expect?

With combine, you can also use a non-zero initial value, so Point(x,y) and BoundingBox(x,y) don't have the same hash, if you happen to mix types in your tables.

Daniel James

10:11 a.m.

On 26 March 2011 02:26, Scott McMurray <me22.ca+boost@gmail.com> wrote:

...

With combine, you can also use a non-zero initial value, so Point(x,y) and BoundingBox(x,y) don't have the same hash, if you happen to mix types in your tables.

That's a good point, but since STL containers aren't polymorphic it's usually not the case. The boost hash function should always match the equality operator, so that would be appropriate for writing 'hash_value' if you have a polymorphic equality operator that always returns false for comparing a Point and a BoundingBox. If you're using the library to write a hash function for another purpose, that isn't really something I considered when writing the documentation. I probably wasn't clear enough that I saw the library as fairly single minded. I think it can work well if you're careful. The design of your hash library could make a more appropriate base.

Daniel James

10:13 a.m.

On 26 March 2011 10:11, Daniel James <dnljms@gmail.com> wrote:

...

The design of your hash library could make a more appropriate base.

A few seconds after sending that I read the email where you said that it's focussed on more secure hash functions. So ignore me there.

5223

Age (days ago)

5224

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Daniel James
Michael Goldshteyn
Scott McMurray
Steven Watanabe
Topher Cooper