
Mathias Gaunard wrote:
On 10/04/2011 10:03 PM, Tim Blechmann wrote:
i was quite amazed that the default hash function for integers seems to be the identity function:
from boost/functional/hash/hash.hpp: inline std::size_t hash_value(int v) { return static_cast<std::size_t>(v); }
while this is probably the most efficient hash function, it is also the worst possible. is this done intentionally?
It's only bad if you know your possible values are not equally distributed over the whole range of "int".
The distribution of the input is irrelevant. It's always mirrored in the distribution of the output (regardless of the hash function, as long as there are no collisions) because there is 1:1 correspondence between the input and output values. The problem arises when output bits are discarded. In the pathological case, if the input is 0..31 and one discards the bottom 5 bits, the result is always 0. But the default hash function can't know that you're going to discard bits, or how to fix that for you.