RE: [boost] [Gauge for interest] Minimal perfect hash functiongenerator

Behalf Of Beman Dawes
At 04:51 AM 1/17/2005, Matt Hurd wrote:
Perfect hashing and statistically reasonable hashing are quite different but practically the same ;-)
Isn't that stretching it a bit? Collision handling code can be eliminated if you know the hashing is perfect, for example. Although I do understand your point that for many practical applications, perfect hashing and statistically reasonable hashing would yield very similar performance.
My apologies if I caused confusion. I was being a little facetious and assumed the points you make.
Seems like both could have a place in Boost.
Yup. And perhaps a third that maps longer "key" sequence types into unique ids suitable for perfect hashing in an efficient way. For example, finding a minimum set of bytes to rotate and xor into a std::size_t that is unique for all keys via a simple decision tree. The size_t is then used for perfect hashing via a generator such as the MPH. This would efficiently meet the needs of a user with a fixed pool of strings or UUIDs, mentioned elsewhere, looking to map to a suitable index.
Statistically reasonable hashing of particular sets of strings has been a need that I've often run into.
Please feel free to take my hash.hpp code into the BSL as the basis for a real implementation that meets your needs w.r.t. a statistically robust but non-perfect hash. It is based on the solution outlined in Java in the articles previously cited. Regards, Matt Hurd matthurd@acm.org IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Hurd, Matthew wrote:
Seems like both could have a place in Boost.
Yup. And perhaps a third that maps longer "key" sequence types into unique ids suitable for perfect hashing in an efficient way. For example, finding a minimum set of bytes to rotate and xor into a std::size_t that is unique for all keys via a simple decision tree. The size_t is then used for perfect hashing via a generator such as the MPH. This would efficiently meet the needs of a user with a fixed pool of strings or UUIDs, mentioned elsewhere, looking to map to a suitable index.
Agreed.
Statistically reasonable hashing of particular sets of strings has been a need that I've often run into.
Please feel free to take my hash.hpp code into the BSL as the basis for a real implementation that meets your needs w.r.t. a statistically robust but non-perfect hash. It is based on the solution outlined in Java in the articles previously cited.
I've tried to get my hands on the cited papers, but was able to find only one of them. Could you provide a concrete pointer where to get the Tong paper from? I'm really interested in complementing the MPH with additional hashing techniques. Thanks! Regards Hartmut

I'm really interested in complementing the MPH with additional hashing techniques.
This is perhaps a little off-thread, I have a particular interest in integer->integer hash functions with good performance and "bit scrambling"/"de-correlation" properties. When I search the literature I have some trouble filtering out the string-oriented hashing techniques. I'm also interested in being able to transition from correlated (f(x) = x) to decorrelated in a systematic way. If you have any good citations handy, I'd be interested! Regards, Nigel Stewart
participants (3)
-
Hartmut Kaiser
-
Hurd, Matthew
-
Nigel Stewart