New subject: [review] hash functions

15 Mar 2005

      In-Reply-To: <001201c527e4$7405efa0$6501a8c0@pdimov2>
pdimov@mmltd.net (Peter Dimov) wrote (abridged):
...
...
In my experience, if you want two different things to hash the same,
you need to write their hash functions with that in mind. It is a 
special case which justifies not using the default hash function.
Maybe, but this is not an argument for making the hash values different.
True. Arguments for that have been given elsewhere. One argument is to 
maximise variation (and for types like bool it is desperately needed).

Another is to allow treating of 32-bit values as bytes. For example,

    size_t hash_range( const int *first, const int *last ) {
        return hash_range(
            static_cast<const unsigned char *>(first),
            static_cast<const unsigned char *>(last) );
    }

may be a good implementation on some platforms. It does 4 times as many 
mixing steps (assuming sizeof(int) == 4) as the proposed algorithm. It is, 
I think, how the designer of the proposed hash_combine algorithm expected 
it to be used.

However, it yields hash values for int * different to the values for char 
*, so with the current design aims we can't use it.
...
...
But doesn't this mean that hash_value( pair<int,int> ) also needs a
seed argument, if you want it to match hash_range( int[2] ) ?
No. The seed version of hash_range is used when you want to compute the 
hash function of a sequence, but you don't have access to the whole 
sequence at once. It's goal is not to provide a way to change the 
initial seed, but to make it possible to continue from an intermediate 
seed.
But suppose I have pair< pair<int,int>, pair<int,int> > ? Shouldn't that 
yield the same hash value as int[4] ?

-- Dave Harris, Nottingham, UK

Re: [boost] [review] hash functions

brangdon＠cix.compulink.co.uk

Caleb Epstein

Peter Dimov

Chris Jefferson

Peter Dimov

Daniel James

tags

participants (5)