
Matthias Troyer wrote:
The initial overhead is tolerable, though, here are some timings on my machine - Pentium M, 1.6GHz, 1GByte:
20 keys: 0.005 [s], 200 keys: 0.03 [s], 2000 keys: 0.15 [s], 20000 keys: 4.45[s]
where the time is raising slightly more worse than linear (actually it has a linear and an exponential component, but the exponential component kicks in for larger key sets only), depending on the number of keys. The memory requirement for the MPH generator is raising linearily with the number of keys: roughly N/3 * sizeof(int), where N is the number of keys.
I'm very interested if this scales well for a larger number of keys, of the order of 10^6 - 10^8 keys. I guess that for those sizes the scaling will already be exponential. Can you tell us what the maximum number of keys is that can be efficiently treated?
Honestly, I've never tried it for such a large set of keys, but I expect the times to be significantly larger. For a random key set of 200.000 keys I've got 88.3 [s] on my machine... I haven't expected somebody to need such large hash tables yet, but I'm willing to think about how to optimise it further. OTOH, I think the time to generate a MPH for larger key sets could be improved significantly, if you're ready to accept a more complex hashing function (used at hashing time) consisting out of several chained calculations. But currently I'm not able to give any prediction wrt this. One of the main goals for this implementation was to get a very simple and efficient hashing function. I've tried to get an account on the new Boost vault, but still waiting for the activation mail... As soon as I get this I'll upload the files, so you'll have the possibility to play around with it. Regards Hartmut