
Dave Harris wrote:
In-Reply-To: <00cd01c524e3$bf3cb4d0$6501a8c0@pdimov2> pdimov@mmltd.net (Peter Dimov) wrote (abridged):
A pure function would entice you to define hash_value for (x, y) as hash_combine(x, y).
And that would be bad because...?
template <typename T1, typename T2> size_t hash_value( const std::pair<T1,T2> &pair ) { return hash_combine( hash_value(pair.first), hash_value(pair.second) ); }
looks good to me. 1 line of code instead of 4, no spurious variable, no side effects, half as many calls to hash_combine(), easier to exploit instruction-level parallelism.
But it will be easy to make mistakes for more complicated (possibly recursive) types. Especially since hash_combine is not associative or transitive. I could see that leading to mistakes. And it would be quite easy to forget to call hash_value (because many types are implicitly convertible to std::size_t), or to call it incorrectly (what namespace is you function in?). Consider: size_t my_hash_function(int x[4]) { using namespace boost; return hash_combine( hash_combine(hash_value(x[0]), hash_value(x[1])), hash_combine(hash_value(x[2]), hash_value(x[3]))); } Which look reasonable, but is for most cases wrong (I'm just using an array to keep the example compact, this becomes more important with more complicated generic types).
We can fix this by initializing the seed in hash_range with something other than zero. It isn't a problem for strings, where \0 is rare, but may be a problem for arbitrary sequences.
That would help.
How about fixing it in hash_combine instead? At the moment hash_combine(0,0) == 0. If you fix it in hash_range, you also need to fix it in every other place a seed is used, eg for pairs.
An alternative would be to implement hash_combine as an object. Something like: struct hash_seed { hash_seed(); template <class T> void combine(T const&); std::size_t value() const; }; Then it could initialise itself to whatever it wants. Daniel