
Hi all, recently I've written a minimal perfect hash function generator, which generates very efficient hashing functions for arbitrary (reasonable large) key sets containing strictly positive integers to achieve a minimal and perfect mapping of these keys to hash values in the range [0..N), where N is the size of the key set (minimality requirement) and the hash values are unique (perfection requirement). IOW, given a set of N arbitrary positve integers (xi > 0) this hash function generator allows to map the integers to another set of integers, the values of which are in between and including 0 and N-1. No duplicate mappings are generated. Generally the developed MPH (minimal perfect hash) generates the coefficients for the following hash functions: h(x) = (C / (D*x + E) mod N or h(x) = (C / x) mod N (D == 1, E == 0) where the second one is choosen almost all the time (generally C, D and E are integers). Especially the second one requires only a couple of instructions to be executed, which makes it a very effective selection alternative for table oriented algorithms. The requirement, that the input keys should be strictly positive isn't a big limitation, because it is always possible to accordingly transform key sets containing possibly negative numbers, or to derive the keys from strings to match. Sounds really perfect! But there are some drawbacks: - the keys (obviously) must be known in advance, so a corresponding hash function may be generated. - some initial computation is required to find the coefficients C, D and E (which is obvious as well :-). The initial overhead is tolerable, though, here are some timings on my machine - Pentium M, 1.6GHz, 1GByte: 20 keys: 0.005 [s], 200 keys: 0.03 [s], 2000 keys: 0.15 [s], 20000 keys: 4.45[s] where the time is raising slightly more worse than linear (actually it has a linear and an exponential component, but the exponential component kicks in for larger key sets only), depending on the number of keys. The memory requirement for the MPH generator is raising linearily with the number of keys: roughly N/3 * sizeof(int), where N is the number of keys. This hash function generator is usable for a wide range of applications, where efficient data lookup is a major goal and where sparse data sets have to be treated efficiently. Is there anybody interested in such a thing? Regards Hartmut