
Thanks for the quick responses!
The solution to our particular problem ended up being very simple: just a
few tweaks to the boost::unordered_map functors. I just wasn't quite
thinking about it clearly.
Also, that Dr. Dobbs article was a helpful read.
Cheers
On Thu, Apr 7, 2011 at 2:08 PM, Ovanes Markarian
On Thu, Apr 7, 2011 at 7:56 PM, Scott McMurray
wrote: We have been using boost hash to hash filenames, but have found with some of our user data that certain strings will produce the same hash code ( "0012g6" and "0012fu" for example). Is there a recommended way to
On Thu, Apr 7, 2011 at 10:08, Erik Scorelle
wrote: predict or resolve these sorts of conflicts?
Having some collisions is the expected behaviour of a (non-cryptographic) hash function. With a birthday search you can easily find thousands of examples.
I'm not sure what you mean by "resolve". Normally, code using hashes expects collisions, and uses a full equality operator to resolve them.
If you cannot accept any collisions, you could use perfect hashing (libraries like http://cmph.sourceforge.net/ or http://www.gnu.org/software/gperf/gperf.html) or use a cryptographic hash (I have a usable, but WIP library at http://svn.boost.org/svn/boost/sandbox/hash/).
~ Scott
Scott,
AFAIK perfect hashing works only with the known set of keys, otherwise it is not possible. Just to make a note: crypto-hashes are much slower as normal ones and use much more memory. Erik, how many file names are you going to hash? There was an article on Dr.Dobbs which compared unordered_map (hash table) implementation with std::map. And it turned out that until some certain amount of items it makes no sence to use a hash table, because a normal map is much faster.
Here is the link: http://drdobbs.com/cpp/198800559?pgno=1
With Kind Regards, Ovanes
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users