----------------------------------------
Date: Wed, 27 Nov 2013 20:10:04 +0000 From: daniel@calamity.org.uk To: boost-users@lists.boost.org Subject: Re: [Boost-users] Hash collision for pairs of int
On 27 November 2013 18:48, Igor R
wrote: I'm experiencing a lot of collisions using the hash_combine function in functional/hash/hash.hpp for integer-pairs-as-keys when there are few unique ints in each of the pair's positions (put together the keys themselves are unique of course).
I've uploaded a simple enough test case here: http://www.speedyshare.com/Wcem6/hash.cc.gz. If there's a more preferred way to upload, let me know and I'll reupload. With 400K pairs, this returns only 86K unique indices and has a max of 18 collisions in size_t.
Am I doing something really wrong here?
Perhaps your case is similar to this one: http://stackoverflow.com/questions/19966041/getting-too-many-collisions-with...
I really need to reconsider the implementation of hash_combine. It was based on the one in Peter Dimov's paper which hadn't really been tested with general values. Part of the idea was the hash would behave similarly on different machines, but I don't think that's really a desirable goal. It probably should use a specialised implementation for common sizes of std::size_t (i.e. 32, 64 and 128 bit).
It might even be worth changing to an interface that's more amenable to more sophisticated hash functions, e.g. one with more state, but I don't really want to introduce a breaking change at the moment. I suppose a clever implementation could allow the two to coexist.
Thank you Igor and Daniel. Igor was correct, that's exactly what I am seeing. I am now using a cityhash-like implementation succcessfully. I look forward to updates to Boost's hash_combine. saurabh