Re: [boost] [openmethod] On the unreasonable efficiency of fast_perfect_hash

12 May 2025


      El 12/05/2025 a las 12:06, Ivan Matek escribió:
...
On Sun, May 11, 2025 at 9:01 PM Joaquin M López Muñoz via Boost 
<boost@lists.boost.org> wrote:
https://github.com/joaquintides/perfect_range_hash
Thank you for this. Unfortunately even this basic math is a bit too 
complex for me
so few small comments:
1. stylistically I would link to particular hash of source code, 
although documentation
link is nicer. Reason is that documentation link can die
I can relink to a more stable doc if the lib is accepted.
...
2. afaik C is not randomly chosen between 0<= C <= 2^64, i.e. it is 
always an odd number.
You're right. We'd have to ask the author, but I guess the "| 1" in
"hash_mult = uniform_dist(rnd) | 1" is there to avoid the (very 
unlikely) case
hash_mult = 0. In any case, this small adjustment doesn't change the big 
picture
statistically --the output distribution is largely dominated by C/B, 
where C is
the multiplier and B is 2^(64-m), so the least significant bit of C is 
mostly
swallowed along the way.
...
3. in my brief experiments I tried to add large gap, and randomly pick 
subset of types,
and performance was still amazing.
Example for 3.: [...]
This case of 231 type_ids scattered in a range of 3000 positions is 
lower-bounded
by (has better chances than) the baseline case with 3000 adjacent 
positions, which
has a probability 0.14 of finding a perfect hash multiplier for 4096 
buckets. Note
that a "hard" case would involve having your type_ids distributed across the
entire 64-bit space of possible numerical values, and in the example you 
wrote
the values are confined to a range that is around 10^14 times smaller 
than 2^64
--this is the fact that the algorithm takes an implicit advantage of.

Joaquin M Lopez Munoz