Re: [boost] Re: [review] hash functions

In-Reply-To: <003501c52a54$a28c4b90$6601a8c0@pdimov> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
size_t hash_value( int v ) { return size_t(v) + default_seed; }
I wonder what do we gain from this.
As you pointed out when I first suggested adding the constant in hash_combine: This would mean N fixes for a sequence of length N, when one simple fix is enough. The initial value of the seed is arbitrary so we can vary it at will. hash_combine, on the other hand, is based on research and has apparently been selected as one of the best amongst the shift+xor family. This doesn't mean that it's sacred, just that if we change it we'll no longer be able to just point at a PDF as a rationale but will need to do our own research (which might be a good idea anyway, though.) :-)
From the point of view of hash_combine the effect is the same
The difference shows when you have something like: struct A { int x, y; }; struct B { A a, b, c; }; size_t hash_value( const A &a ) { size_t hash = 0xbead1; hash_combine( hash, a.x ); hash_combine( hash, a.y ); return hash; } size_t hash_value( const B &b ) { size_t hash = 0xbead2; hash_combine( hash, b.a ); hash_combine( hash, b.b ); hash_combine( hash, b.c ); return hash; } Here hash_combine is called 9 times, but hash_value(int) is only called 6 times, so we get different results depending on where the constant is added.
and we now rely on the user overloads of hash_value to not produce a zero.
That's surely reasonable. The hash_value of any user-defined class should be defined in terms of the hash_values of primitives, and boost provides all of those.
This reflects their intended use. The two argument overload is used when one has a whole range and wants its hash value, as in the hash_value overload for std::vector, for example.
I actually think this is going to be fairly rare. This is mainly because there will usually be a constant thrown in to represent the type of the object. (I appreciate you won't do that for std containers, but I maintain its a good idea for user-defined types.) The 2-argument version is strictly redundant as it must be defined in terms of the 3-argument version. It's just a convenience function, used mainly by the std containers. Admittedly hash_combine is also (nearly) redundant, being definable as: void hash_combine( size_t &hash, const T &t ) { hash_range( hash, &t, &t+1 ); } if T does not overload address-of. Incidently, do you agree we will sometimes want to pass a hash value as the second argument to hash_combine? Like: hash_combine( hash, hash_range( first, last ) ); hash_combine( hash, obj.get_hash( some_arg ) ); If not, then maybe we should have a hash_combine that just combines hashes. At the moment the combining is mixed in with the getting; it's not a very orthogonal API. -- Dave Harris, Nottingham, UK

Dave Harris wrote:
In-Reply-To: <003501c52a54$a28c4b90$6601a8c0@pdimov> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
This reflects their intended use. The two argument overload is used when one has a whole range and wants its hash value, as in the hash_value overload for std::vector, for example.
I actually think this is going to be fairly rare. This is mainly because there will usually be a constant thrown in to represent the type of the object. (I appreciate you won't do that for std containers, but I maintain its a good idea for user-defined types.)
I disagree (I won't go into the reasons, as they've already been stated enough times). But, I'll add a note that this can be done to the tutorial and explain why you might want to do it.
The 2-argument version is strictly redundant as it must be defined in terms of the 3-argument version. It's just a convenience function, used mainly by the std containers.
Admittedly hash_combine is also (nearly) redundant, being definable as:
void hash_combine( size_t &hash, const T &t ) { hash_range( hash, &t, &t+1 ); }
if T does not overload address-of.
Since nested vectors give different hash values to their flattened form, I don't think a single element vector should have the same hash values as it's element. We don't want [[1], 2] (using square brackets as a representation of a sequence, not in the normal C++ sense) to be considered equivalent to [1, 2], so [1] is not equivalent to 1. Now, I suppose the response to that is, if membership of a sequence affects the hash value, why doesn't type? I think the hash function should consider its values in the same manner as the STL does. std::equal_to considers 1u == (char) 1, but not [1]. Does that make sense? If it does, I'll write it in a more verbose form in the documentation.
Incidently, do you agree we will sometimes want to pass a hash value as the second argument to hash_combine? Like:
hash_combine( hash, hash_range( first, last ) ); hash_combine( hash, obj.get_hash( some_arg ) );
If not, then maybe we should have a hash_combine that just combines hashes. At the moment the combining is mixed in with the getting; it's not a very orthogonal API.
No, the way to do this is to overload hash_value for the type of the object being hashed. Once I've added support for Boost.Range (or vice-versa), boost::iterator_range could be used for value type of a range (for the first example). For the second one, you'll have to overload hash_value yourself. So they become: hash_combine(seed, boost::make_iterator_range(first, last)); hash_combine(seed, obj); Daniel

Dave Harris wrote:
and we now rely on the user overloads of hash_value to not produce a zero.
That's surely reasonable. The hash_value of any user-defined class should be defined in terms of the hash_values of primitives, and boost provides all of those.
I'm reluctant to elevate hash_combine and friends to a status of mandatory hash primitives. I think of them as utility components that help people write reasonable hash functions. As I said elsewhere, I think that it is reasonable for the only requirement on hash_value(x) to be that equivalent arguments yield the same result, and that distinct arguments "usually" produce different results. This is currently the case in TR1, too. So, in my opinion, your original suggestion is still the best one, even though I did not seem to agree at first. ;-)
Incidently, do you agree we will sometimes want to pass a hash value as the second argument to hash_combine? Like:
hash_combine( hash, hash_range( first, last ) ); hash_combine( hash, obj.get_hash( some_arg ) );
The first line should be hash_range( hash, first, last ); It's not the same thing, technically, but it has the same effect. The second should probably be hash_combine( hash, obj ); but...
If not, then maybe we should have a hash_combine that just combines hashes. At the moment the combining is mixed in with the getting; it's not a very orthogonal API.
... yes, hash_value( size_t ) is the identity function partly for this reason. The default in hash_combine is to invoke hash_value because these use cases outnumber the use cases where you already have a suitable size_t hash value.

Peter Dimov wrote:
Incidently, do you agree we will sometimes want to pass a hash value as the second argument to hash_combine? Like:
hash_combine( hash, hash_range( first, last ) ); hash_combine( hash, obj.get_hash( some_arg ) );
The first line should be
hash_range( hash, first, last );
It's not the same thing, technically, but it has the same effect.
It doesn't, it concatenates sequences, the original doesn't.
participants (3)
-
brangdon@cix.compulink.co.uk
-
Daniel James
-
Peter Dimov