RE: [boost] [Gauge for interest] Minimal perfect hash functiongenerator

17 Jan 2005

      Matthias Troyer wrote:
...
...
The initial overhead is tolerable, though, here are some 
timings on my 
machine - Pentium M, 1.6GHz, 1GByte:
20 keys: 0.005 [s], 200 keys: 0.03 [s], 2000 keys: 0.15 [s], 
20000
keys: 4.45[s]
where the time is raising slightly more worse than linear 
(actually it 
has a linear and an exponential component, but the exponential 
component kicks in for larger key sets only), depending on 
the number 
of keys.
The memory requirement for the MPH generator is raising 
linearily with 
the number of keys: roughly N/3 * sizeof(int), where N is 
the number 
of keys.
I'm very interested if this scales well for a larger number 
of keys, of the order of 10^6 - 10^8 keys. I guess that for 
those sizes the scaling will already be exponential. Can you 
tell us what the maximum number of keys is that can be 
efficiently treated?
Honestly, I've never tried it for such a large set of keys, but I expect the
times to be significantly larger. For a random key set of 200.000 keys I've
got 88.3 [s] on my machine... 

I haven't expected somebody to need such large hash tables yet, but I'm
willing to think about how to optimise it further. OTOH, I think the time to
generate a MPH for larger key sets could be improved significantly, if
you're ready to accept a more complex hashing function (used at hashing
time) consisting out of several chained calculations. But currently I'm not
able to give any prediction wrt this. One of the main goals for this
implementation was to get a very simple and efficient hashing function.

I've tried to get an account on the new Boost vault, but still waiting for
the activation mail... As soon as I get this I'll upload the files, so
you'll have the possibility to play around with it.

Regards Hartmut

RE: [boost] [Gauge for interest] Minimal perfect hash functiongenerator

Hartmut Kaiser