
Hurd, Matthew wrote:
Seems like both could have a place in Boost.
Yup. And perhaps a third that maps longer "key" sequence types into unique ids suitable for perfect hashing in an efficient way. For example, finding a minimum set of bytes to rotate and xor into a std::size_t that is unique for all keys via a simple decision tree. The size_t is then used for perfect hashing via a generator such as the MPH. This would efficiently meet the needs of a user with a fixed pool of strings or UUIDs, mentioned elsewhere, looking to map to a suitable index.
Agreed.
Statistically reasonable hashing of particular sets of strings has been a need that I've often run into.
Please feel free to take my hash.hpp code into the BSL as the basis for a real implementation that meets your needs w.r.t. a statistically robust but non-perfect hash. It is based on the solution outlined in Java in the articles previously cited.
I've tried to get my hands on the cited papers, but was able to find only one of them. Could you provide a concrete pointer where to get the Tong paper from? I'm really interested in complementing the MPH with additional hashing techniques. Thanks! Regards Hartmut