Re: [boost] [OpenMethod] Few questions about unreasonable effectiveness of fast_perfect_hash and other minor things

2 May 2025

      El 02/05/2025 a las 3:05, Ivan Matek via Boost escribió:
...
On Fri, May 2, 2025 at 2:10 AM Jean-Louis Leroy via Boost <
boost@lists.boost.org> wrote:
...
Maybe Joaquín wants to jump in, I don't want to put words in his mouth ;-)
Would be nice, from what I got from his talk some performance I have seen
in my experiments, e.g. finding perfect hash for  368 items in 512 buckets
seems almost impossible when we apply formula from the talk.
Ok, 
boost::openmethod::policies::fast_perfect_hash::hash_initialize(first, 
last) takes a range of N class info objects, calculates M as the lowest 
shift so that 1<<M >= N and then randomly tries multipliers C until one 
of them is such that

(type_id*C)>>(64-M)

produces no collisions, that is, it acts as perfect hash for the 
type_ids in [first, last) into 2^M buckets. 100,000 random multipliers 
are tried, and if all fail then M is bumped by 1, 2, 3, and if nothing 
works the function bails out. For N = 368, the values of M tried are 9, 
10, 11 and 12, which correspond to 512, 1024, 2048 and 4096 buckets, 
respectively. Now, according to the formulae from the talk you mention, 
the average number of random tries necessary to find a perfect 
multiplier (assuming the input is random) is, for each number of 
buckets: 512 buckets: 1.63840297782e+80 tries 1024 buckets: 
7.17526892837e+32 tries 2048 buckets: 1.83078061406e+15 tries 4096 
buckets: 24222060 tries

all of them much greater than the 100,000 limit for each value of M 
imposed by hash_initialize. So, as you point out, registration of 368 
classes should fail with extremely high probability, right? I wrote this 
small program that mocks up the registration of 368 classes and calls 
fast_perfect_hash::hash_initialize:

#include <array> #include <boost/openmethod.hpp> #include 
<boost/unordered/unordered_flat_set.hpp> #include <iostream> using 
namespace boost::openmethod; struct fake_class { fake_class(type_id id) 
{ type_ids[0] = id; } auto type_id_begin() { return &type_ids[0]; } auto 
type_id_end() { return &type_ids[1]; } type_id type_ids[1]; }; static 
constexpr std::size_t num_classes = 368; int main() { 
std::vector<fake_class> data; boost::unordered_flat_set<type_id> unique; 
std::default_random_engine rnd(13081963); 
std::uniform_int_distribution<type_id> d; for(int i=0;i<num_classes;++i) 
{ for(;;) { auto id = d(rnd); if(unique.insert(id).second) { 
data.push_back(id); break; } } } using fast_perfect_hash= 
policies::fast_perfect_hash<BOOST_OPENMETHOD_DEFAULT_POLICY>; 
fast_perfect_hash::hash_initialize(data.begin(), data.end()); // locally 
tweaked fast_perfect_hash so that these members are public 
std::cout<<"hash_mult: "<<fast_perfect_hash::hash_mult<<"\n"; 
std::cout<<"M: "<<(sizeof(std::size_t) * CHAR_BIT - 
fast_perfect_hash::hash_shift)<<"\n"; } As type_ids we simply use random 
numbers, and, sure enough, the program fails: could not find hash 
factors after 400000s using 8192 buckets 
D:\openmethods\x64\Debug\openmethods.exe (process 24820) exited with 
code 3. Now, let's change the uniform distribution (d) as follows: 
std::uniform_int_distribution<type_id> d(1, 1000); and, with this 
change, registration succeeds for 1024 buckets! hash_mult: 
16501117525188094459 M: 10 D:\openmethods\x64\Debug\openmethods.exe 
(process 24332) exited with code 0.

What's hapenning? In the second run, all the type_ids are contained in a 
range that is much smaller than the entire space of 64-bit numbers, so 
it is *much much* more likely that a random multiplier will be perfect. 
In real life, type_ids are just the addresses of typeid(X) for each 
registered class X, and as it happens these usually occupy a contiguous 
region of memory instead of being scattered across the 2^64 bytes of the 
memory space. To sum up, fast_perfect_hash::hash_initialize works 
against theoretical expectation because type_info objects in a program 
occupy a small, contiguous range of addresses within the memory space.

Joaquin M Lopez Munoz

Re: [boost] [OpenMethod] Few questions about unreasonable effectiveness of fast_perfect_hash and other minor things

Joaquin M López Muñoz