Re: [boost] [review] hash functions

In-Reply-To: <00cd01c524e3$bf3cb4d0$6501a8c0@pdimov2> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
A pure function would entice you to define hash_value for (x, y) as hash_combine(x, y).
And that would be bad because...? template <typename T1, typename T2> size_t hash_value( const std::pair<T1,T2> &pair ) { return hash_combine( hash_value(pair.first), hash_value(pair.second) ); } looks good to me. 1 line of code instead of 4, no spurious variable, no side effects, half as many calls to hash_combine(), easier to exploit instruction-level parallelism.
We can fix this by initializing the seed in hash_range with something other than zero. It isn't a problem for strings, where \0 is rare, but may be a problem for arbitrary sequences.
That would help. How about fixing it in hash_combine instead? At the moment hash_combine(0,0) == 0. If you fix it in hash_range, you also need to fix it in every other place a seed is used, eg for pairs. -- Dave Harris, Nottingham, UK

Dave Harris wrote:
In-Reply-To: <00cd01c524e3$bf3cb4d0$6501a8c0@pdimov2> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
A pure function would entice you to define hash_value for (x, y) as hash_combine(x, y).
And that would be bad because...?
That would be bad because the hash value of the pair (x, y) would no longer match the hash value of the sequence (x, y).
We can fix this by initializing the seed in hash_range with something other than zero. It isn't a problem for strings, where \0 is rare, but may be a problem for arbitrary sequences.
That would help.
How about fixing it in hash_combine instead?
This would mean N fixes for a sequence of length N, when one simple fix is enough. The initial value of the seed is arbitrary so we can vary it at will. hash_combine, on the other hand, is based on research and has apparently been selected as one of the best amongst the shift+xor family. This doesn't mean that it's sacred, just that if we change it we'll no longer be able to just point at a PDF as a rationale but will need to do our own research (which might be a good idea anyway, though.) Do you have a different hash_combine in mind that doesn't suffer from this problem and is better than the proposed implementation? ;-) (Keeping in mind that even the current hash_combine is perceived as slower than necessary, of course.)

Dave Harris wrote:
In-Reply-To: <00cd01c524e3$bf3cb4d0$6501a8c0@pdimov2> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
A pure function would entice you to define hash_value for (x, y) as hash_combine(x, y).
And that would be bad because...?
template <typename T1, typename T2> size_t hash_value( const std::pair<T1,T2> &pair ) { return hash_combine( hash_value(pair.first), hash_value(pair.second) ); }
looks good to me. 1 line of code instead of 4, no spurious variable, no side effects, half as many calls to hash_combine(), easier to exploit instruction-level parallelism.
But it will be easy to make mistakes for more complicated (possibly recursive) types. Especially since hash_combine is not associative or transitive. I could see that leading to mistakes. And it would be quite easy to forget to call hash_value (because many types are implicitly convertible to std::size_t), or to call it incorrectly (what namespace is you function in?). Consider: size_t my_hash_function(int x[4]) { using namespace boost; return hash_combine( hash_combine(hash_value(x[0]), hash_value(x[1])), hash_combine(hash_value(x[2]), hash_value(x[3]))); } Which look reasonable, but is for most cases wrong (I'm just using an array to keep the example compact, this becomes more important with more complicated generic types).
We can fix this by initializing the seed in hash_range with something other than zero. It isn't a problem for strings, where \0 is rare, but may be a problem for arbitrary sequences.
That would help.
How about fixing it in hash_combine instead? At the moment hash_combine(0,0) == 0. If you fix it in hash_range, you also need to fix it in every other place a seed is used, eg for pairs.
An alternative would be to implement hash_combine as an object. Something like: struct hash_seed { hash_seed(); template <class T> void combine(T const&); std::size_t value() const; }; Then it could initialise itself to whatever it wants. Daniel
participants (3)
-
brangdon@cix.compulink.co.uk
-
Daniel James
-
Peter Dimov