Re: [boost] [review] hash functions

In-Reply-To: <018901c52569$ea842610$6601a8c0@pdimov> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
That would be bad because the hash value of the pair (x, y) would no longer match the hash value of the sequence (x, y).
Why is it necessary for pairs and sequences to have the same hash value? They are different things. If it is necessary, we can avoid the problem by fixing hash_range: template <typename It> size_t hash_range( It first, It last ) { if (first == last) return 0x87654321; // Or some other value; size_t seed = hash_value(*first); for (++first; first != last; ++first) seed = hash_combine( seed, *first ); return seed; } This also gives hash_range( &a, &a+1 ) the same result as hash_value(a), which your approach does not.
I had in mind something like: template <typename T> size_t hash_combine( size_t seed, const T &v ) { const size_t c = 0x12345678; // Or some other value. return seed ^ (hash_value(v) + c + (seed<<6) + (seed>>2)); } This costs one extra addition, which can probably be scheduled in parallel with other operations anyway. -- Dave Harris, Nottingham, UK

Dave Harris wrote:
pair<int, int>, int[2], struct { int a; int b; } are different representations of the same thing; it is not strictly necessary for them to use the same algorithm, but it's desirable and will eliminate a source of confusion and mistakes.
The simpler "initialize seed, modify seed with each element" approach is much cleaner. It can also be expressed in terms of void hash_range( size_t & seed, It first, It last ); should this function be added. You are welcome to spell the above as size_t hash_range( size_t seed, It first, It last ); and I understand your reasons for doing so, I just don't see side effects as so evil. Anyway, the good thing in this formulation is that you can hash a range in stages, if you only have access to a subrange at a time. The non-uniform treatment of empty sequences and the first element of the sequence would force you to keep state. State is even more evil than side effects. ;-)
This also gives hash_range( &a, &a+1 ) the same result as hash_value(a), which your approach does not.
Yes, you are right, starting with a nonzero seed makes sequences of one element produce a different hash value than the element itself.
This may be good enough.
participants (2)
-
brangdon@cix.compulink.co.uk
-
Peter Dimov