
On Mon, May 12, 2025 at 1:48 PM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:
El 12/05/2025 a las 12:06, Ivan Matek escribió: You're right. We'd have to ask the author, but I guess the "| 1" in "hash_mult = uniform_dist(rnd) | 1" is there to avoid the (very unlikely) case hash_mult = 0. In any case, this small adjustment doesn't change the big picture
I think reason people pick odd number is because if your multiplier is power of 2 last bits are guaranteed to be 0. Not sure that matters a lot here since we only use few top bits. https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that... * the number 11400714819323198486 is closer but we don’t want multiples of two because that would throw away one bit)*
statistically --the output distribution is largely dominated by C/B, where C is the multiplier and B is 2^(64-m), so the least significant bit of C is mostly swallowed along the way.
This case of 231 type_ids scattered in a range of 3000 positions is lower-bounded by (has better chances than) the baseline case with 3000 adjacent positions, which has a probability 0.14 of finding a perfect hash multiplier for 4096 buckets. Note that a "hard" case would involve having your type_ids distributed across the entire 64-bit space of possible numerical values, and in the example you wrote the values are confined to a range that is around 10^14 times smaller than 2^64
Thank you for explanation. For me it is interesting even if they are not consecutive, but bounded within "small"( compared to 64bit range) interval perfect hashing still works great. That was surprising to me, but I guess even noneven distribution among small range of values can be nicely hashed. To think a bit more broadly, not just in terms of openmethods... this is quite interesting to me, e.g. does that mean that randomly picked short(16bit) numbers can be hashed much better(assuming "proper" 64 bit hash, even though that is not what many STL implementations do, i.e. they use identity <https://godbolt.org/z/bYYT3M5j1>) than same number of randomly picked int(32 bit) numbers... What about randomly picked int vs randomly picked positive int(not unsigned int, but int whose value is positive)...