
On Tuesday, January 31, 2012 18:55:21 Daniel James wrote:
The requirements for the hash function are:
For two diļ¬erent values t1 and t2, the probability that h(t1) and h(t2) compare equal should be very small, approaching 1.0 / numeric_- limits<size_t>::max().
There's no requirement about how h(t1) and h(t2) are distributed
Right.
so it must be the container's responsibility, since the container must work well with any hash function that meets these requirements.
Wrong, there's no such requirement. It must work, not necessarilly well.
How else can you interpret it?
Simple: it is possible to design a hash function that will cause suboptimal performance with the container. No matter how you implement the container, with bit mixing or not, this statement still holds. I understand your reasoning. You're trying to reduce the probability of unordered container being inefficient because of an inefficient hash function. This also approach has the benefit of simplification of hash functions definition. What I'm saying is that this pessimization is the wrong thing to do because you make users pay for what they don't need. Most of the time, when you deal with hash containers, you have a pretty good idea of the key nature and the better ways to hash key values. There are times when the key values already have a good distribution (e.g. the keys are GUIDs) and there is no need in additional bit mixing. For other cases there are known good hashing algorithms so you often just use the one which better fits your needs. And for standard types I would expect std::hash to do the right thing. I know, the standard doesn't spell out that explicitly (which is a bad thing), but this is how it works in practice (from my experience).