New subject: [review] hash functions

22 Mar 2005

      In-Reply-To: <003501c52a54$a28c4b90$6601a8c0@pdimov>
pdimov@mmltd.net (Peter Dimov) wrote (abridged):
...
...
size_t hash_value( int v ) {
       return size_t(v) + default_seed;
   }
I wonder what do we gain from this.
As you pointed out when I first suggested adding the constant in 
hash_combine:

    This would mean N fixes for a sequence of length N, when one
    simple fix is enough. The initial value of the seed is arbitrary
    so we can vary it at will. hash_combine, on the other hand, is
    based on research and has apparently been selected as one of the
    best amongst the shift+xor family. This doesn't mean that it's
    sacred, just that if we change it we'll no longer be able to
    just point at a PDF as a rationale but will need to do our
    own research (which might be a good idea anyway, though.)

:-)
...
From the point of view of hash_combine the effect is the same
The difference shows when you have something like:

    struct A { int x, y; };
    struct B { A a, b, c; };

    size_t hash_value( const A &a ) {
        size_t hash = 0xbead1;
        hash_combine( hash, a.x );
        hash_combine( hash, a.y );
        return hash;
    }

    size_t hash_value( const B &b ) {
        size_t hash = 0xbead2;
        hash_combine( hash, b.a );
        hash_combine( hash, b.b );
        hash_combine( hash, b.c );
        return hash;
    }

Here hash_combine is called 9 times, but hash_value(int) is only called 6 
times, so we get different results depending on where the constant is 
added.
...
and we now rely on the user overloads of hash_value to not produce
a zero.
That's surely reasonable. The hash_value of any user-defined class should 
be defined in terms of the hash_values of primitives, and boost provides 
all of those.
...
This reflects their intended use. The two argument overload is used 
when one has a whole range and wants its hash value, as in the 
hash_value overload for std::vector, for example.
I actually think this is going to be fairly rare. This is mainly because 
there will usually be a constant thrown in to represent the type of the 
object. (I appreciate you won't do that for std containers, but I maintain 
its a good idea for user-defined types.)

The 2-argument version is strictly redundant as it must be defined in 
terms of the 3-argument version. It's just a convenience function, used 
mainly by the std containers.

Admittedly hash_combine is also (nearly) redundant, being definable as:

    void hash_combine( size_t &hash, const T &t ) {
        hash_range( hash, &t, &t+1 );
    }

if T does not overload address-of. 

Incidently, do you agree we will sometimes want to pass a hash value as 
the second argument to hash_combine? Like:

    hash_combine( hash, hash_range( first, last ) );
    hash_combine( hash, obj.get_hash( some_arg ) );

If not, then maybe we should have a hash_combine that just combines 
hashes. At the moment the combining is mixed in with the getting; it's not 
a very orthogonal API.

-- Dave Harris, Nottingham, UK

Re: [boost] Re: [review] hash functions

brangdon＠cix.compulink.co.uk

Daniel James

Peter Dimov

Daniel James

tags

participants (3)