Re: [boost] [review] hash functions

13 Mar 2005

      On Mar 13, 2005, at 11:13 AM, Peter Dimov wrote:
...
implementation to which you port your program. Your original tests are 
still valid.
The other side of predictability - that the hash function is not 
affected by types or the particular representation of a sequence of 
values - means that you don't necessarily have to re-run the 
performance benchmark of your unordered_maps when you refactor the key 
to use int[3] instead of struct { short, short, short }, or a wstring 
instead of a string.
Fundamentally, what makes for a good hash function is that things that 
are equal (however you are defining equality) always hash to the same 
value, while any small difference between two things should cause them 
to hash to completely different values.  In an arbitrary program would 
I expect an int[3] to be equal to a struct { short, short, short } when 
the values happened to be equal?  No.  What if the change in 
representation you spoke of was to rotate the coordinate axes for my 
vector, would I expect values with the same coordinates in the two 
different coordinate-systems, represented by different structs to be 
equal?  No.  If I have two structs representing complex numbers, one 
using a Cartesian representation and the other polar would I expect 
them to be treated as equal because their coordinates are equal?  No -- 
I would be appalled if they were.

A general purpose hash function that behaves like this is a weak one at 
best (and in some situations it can be a horrendously bad one).  Your 
argument is that one should allow those developers who are making one 
specific change in representation out of the hundreds of possible 
reasonable ones should be allowed to be able to get away with being 
slipshod (and yes, not re-running the performance benchmark even with 
this deliberate weakening of the benchmark would be slipshod, though 
justifiably under many conditions).  I don't buy it.

Now if you made an argument that the benefits of strengthening it were, 
in practice, too minor to consider and the costs (to, e.g., seed the 
hash combination sequence with a class dependent value) was too high, 
then I might agree.  Doing something that is theoretically "wrong" 
because it is pragmatically the "right" thing to do, is good 
engineering.  Arguing that something that is theoretically "wrong" is 
in some real sense the right thing to do is bad engineering, bad math 
and bad science.

Topher

Re: [boost] [review] hash functions

Topher Cooper