
On Mon, 17 Jan 2005 13:19:34 +0100, Hartmut Kaiser <hartmutkaiser@t-online.de> wrote:
Christopher D. Russell wrote:
The requirement, that the input keys should be strictly positive isn't a big limitation, because it is always possible to accordingly transform key sets containing possibly negative numbers, ...
? By addition of a constant positive offset?
Sure.
or to derive the keys from strings to match.
? You lost me here. Would you explain this point in a little more detail please.
You're certainly right, that there is no generic and simple way to map strings to integers besides reinterpreting the bit pattern of the string as a (probably long) integer. What I've meant is, that sometimes you may use some simple mapping, for instance adding all the character values of the string given that these give no collisions for your input key set.
The code I posted as part of this thread does this in a statistically optimal way, it may be argued, for most types, including strings and is quite efficient. It works for contiguous types as well, more or less, including doubles and such things as UUIDs. Where it differs is that the MPH generator generates a wonderful collision free set for a fixed universe which leads to a nice mapping from a code to an index suitable for contiguous storage or enumeration. I will interested to have a look at the MPH as see how it can adapt to arbitrarily long keys, especially strings, fixed and variable length. I can also imagine an "optimal" implementation touching the minimum number of bytes and doing the minimum number of operations to derive an optimal hash for a given set of strings... I'd be able to thwack such a clever thing into many systems I have running and gain very tangible benefits if such a thing were possible. $0.02 matt.