
On Thu, Apr 7, 2011 at 10:08, Erik Scorelle
We have been using boost hash to hash filenames, but have found with some of our user data that certain strings will produce the same hash code ( "0012g6" and "0012fu" for example). Is there a recommended way to predict or resolve these sorts of conflicts?
Having some collisions is the expected behaviour of a (non-cryptographic) hash function. With a birthday search you can easily find thousands of examples. I'm not sure what you mean by "resolve". Normally, code using hashes expects collisions, and uses a full equality operator to resolve them. If you cannot accept any collisions, you could use perfect hashing (libraries like http://cmph.sourceforge.net/ or http://www.gnu.org/software/gperf/gperf.html) or use a cryptographic hash (I have a usable, but WIP library at http://svn.boost.org/svn/boost/sandbox/hash/). ~ Scott