[openmethod] On the unreasonable efficiency of fast_perfect_hash

Some days ago we discussed the fact that the perfect hashing machinery of candidate Boost.OpenMethod is apparently at odds with the theoretical probability of finding a viable function by chance (i.e. using brute force). I've taken a closer look at this issue and wrote a small article explaining why Boost.OpenMethod's fast_perfect_hash is so exceptionally successful: https://github.com/joaquintides/perfect_range_hash I hope the analysis is of interest to those puzzling over this. Joaquin M Lopez Munoz

Thank you for this. Unfortunately even this basic math is a bit too complex for me so few small comments: 1. stylistically I would link to particular hash of source code, although documentation link is nicer. Reason is that documentation link can die 2. afaik C is not randomly chosen between 0<= C <= 2^64, i.e. it is always an odd number. 3. in my brief experiments I tried to add large gap, and randomly pick subset of types, and performance was still amazing. Example for 3.: BOOST_OPENMETHOD_CLASSES(Event, Event2, Event3, Event5, Event7, Event10, Event11, Event13, Event15, Event17, Event19, Event20, Event23, Event25, Event29, Event30, Event31, Event35, Event37, Event40, Event41, Event43, Event45, Event47, Event50, Event53, Event55, Event59, Event60, Event61, Event65, Event67, Event70, Event71, Event73, Event75, Event79, Event80, Event83, Event85, Event89, Event90, Event95, Event97, Event100, Event101, Event103, Event105, Event107, Event109, Event110, Event113, Event115, Event120, Event125, Event127, Event130, Event131, Event135, Event137, Event139, Event140, Event145, Event149, Event150, Event151, Event155, Event157, Event160, Event163, Event165, Event167, Event170, Event173, Event175, Event179, Event180, Event181, Event185, Event190, Event191, Event193, Event195, Event197, Event199, Event200, Event205, Event210, Event211, Event215, Event220, Event223, Event225, Event227, Event229, Event230, Event233, Event235, Event239, Event240, Event241, Event245, Event250, Event251, Event255, Event257, Event260, Event263, Event265, Event269, Event270, Event271, Event275, Event277, Event280, Event281, Event283, Event285, Event290, Event293, Event295, Event300, Event305, Event307, Event310, Event311, Event313, Event315, Event317, Event320, Event325, Event330, Event331, Event335, Event337, Event340, Event345, Event347, Event349, Event350, Event353, Event355, Event359, Event360, Event365, Event367, Event370, Event373, Event375, Event379, Event380, Event383, Event385, Event389, Event390, Event395, Event397, Event400, Event401, Event405, Event409, Event410, Event415, Event419, Event420, Event421, Event425, Event430, Event431, Event433, Event435, Event439, Event440, Event443, Event445, Event449, Event450, Event455, Event457, Event460, Event461, Event463, Event465, Event467, Event470, Event475, Event479, Event480, Event485, Event487, Event490, Event491, Event495, Event499, Event500, Event503, Event505, Event695, Event700, Event895, Event900, Event1000, Event2000, Event2925, Event2927, Event2930, Event2935, Event2939, Event2940, Event2945, Event2950, Event2953, Event2955, Event2957, Event2960, Event2963, Event2965, Event2969, Event2970, Event2971, Event2975, Event2980, Event2985, Event2990, Event2995, Event2999, Event3000); I have also added NonEvent types in the binary, but I presume maybe compiler groups the typeids based on the inheritance but it did not seem to harm the performance). As for how I selected Events: no particular logic beside I tried to get large gaps, clusters, in general just a lame human intuition for what is "hard". Still seems to be a non issue for finding perfect hash. Finding hash factor for 231 types trying with M = 9, 512 buckets found 47091734271558211 after 182 attempts; span = [0, 511] I have attached my events.h and main.cpp to save you time if you want to replicate measurement. On Sun, May 11, 2025 at 9:01 PM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:
Some days ago we discussed the fact that the perfect hashing machinery of candidate Boost.OpenMethod is apparently at odds with the theoretical probability of finding a viable function by chance (i.e. using brute force).
I've taken a closer look at this issue and wrote a small article explaining why Boost.OpenMethod's fast_perfect_hash is so exceptionally successful:
https://github.com/joaquintides/perfect_range_hash
I hope the analysis is of interest to those puzzling over this.
Joaquin M Lopez Munoz
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Ivan Matek
-
Joaquin M López Muñoz