
On Mar 13, 2005, at 2:04 PM, Peter Dimov wrote:
Topher Cooper wrote:
Fundamentally, what makes for a good hash function is that things that are equal (however you are defining equality) always hash to the same value, while any small difference between two things should cause them to hash to completely different values. In an arbitrary program would I expect an int[3] to be equal to a struct { short, short, short } when the values happened to be equal? No. What if [...]
Since the keys in an unordered_map are of the same type, it is usually not possible for one of them to contain int[3] and for another to hold struct { short, short, short }.
A discriminated union type (boost::variant) that is able to hold either type can easily hash_combine the index of the currently active type to avoid these accidental collisions. Similarly, a boost::any-alike can hash_combine the address of a structure that uniquely identifies the type, or typeid().name(). This mirrors their implementation of operator==.
I agree that the current design may not be well suited for a general purpose hash function that needs to be able to sort objects of arbitrary types into buckets. I'm not sure how often this occurs in practice. Do you have examples?
Actually, I agree that if the issue is only vectors, pairs and sequences that the pragmatics argues to this approach. This behavior, though it is contrary to the spirit of object oriented programming (at least as I've interpreted that concept for 20+ years) is harmless. Different types (kinds, classes) of objects are different except as specified by is-a inheritance. I would still argue that it is not behavior to be sought, but it is harmless if it is the most convenient way to code things. I am made very uncomfortable, however when we put structs (and therefore, by a primary identity in the C++ language, classes) into the mix. Here we may have inheritance relationship (not a union) where two different types with different meanings to the quantities can end up introducing a tendency for rather different things to hash to the same values. I presented the example of two different classes representing polar and Cartesian representation of complex numbers (and, yes, I have written packages where these two representations co-exist and intermix to allow optimizations). Of course for this example, you would clearly want a custom hash, but the point is, subclassing does not make this situation that bizarre. What about a table containing structs for Resources, which could represent either employees or pieces of office equipment where the characteristics of interest in each happen to all be represented as a bunch of booleans, that happen to be the same length? I've perhaps ended up over-stating things. Let me summarize the nitty-gritty: I see no good reason to promote this as *desirable* behavior, and I worry about it occasionally producing some odd results. Good hashing errs on the side of not building in a bias towards inappropriate collisions IF it can be conveniently avoided. Topher