[OpenMethod] Few questions about unreasonable effectiveness of fast_perfect_hash and other minor things

newer
[OpenMethod] A pre-release feedback

Ivan Matek

1 May 2025 1 May '25

10:33 p.m.

Hi everyone, few questions: 1. fast_perfect_hash achieves performance(load factor) I would assume is impossible, based on perfect hashing talk <https://www.youtube.com/watch?v=yOo6GnbKzp8>. I presume reason why it works so well that compiler places type info data clustered in same region of addresses... Is that correct? 2. regarding static_list: would Boost.Intrusive list work here also, and if so was this considered? 3. basic_error_output does not seem to need the Policy template parameter, is this left for consistency or am I wrong that it is not needed? 4. dynamic_cast_ref docs <https://github.com/jll63/Boost.OpenMethod/blob/ea7dbe86b511797f0281ef07b3ada645fe569328/doc/custom_rtti.adoc> mention that D must match the reference category of B. Would it be possible to static_assert that?

Show replies by date

Jean-Louis Leroy

2 May 2 May

12:09 a.m.

...

1. fast_perfect_hash achieves performance(load factor) I would assume is impossible, based on perfect hashing talk <https://www.youtube.com/watch?v=yOo6GnbKzp8>. I presume reason why it works so well that compiler places type info data clustered in same region of addresses... Is that correct?

I don't think it is a factor. And counting on clustering would be a rabbit hole all the way to compiler-specific behavior hell. Joaquín and I discussed this in Madrid - we were both speakers at the "using std::cpp" conference. I think it boils down to: - I make many more attempts with the same number of buckets. - My hash function is very simple. - For my purpose, speed matters, not space. Maybe Joaquín wants to jump in, I don't want to put words in his mouth ;-)

...

2. regarding static_list: would Boost.Intrusive list work here also, and if so was this considered?

Yes. static_list needs to work in a special context: static construction time.I.e., no guaranteed construction order across translation units. Its default constructor does nothing. It relies on zero-initialization. I remember that Boost.Intrusive lists had customization points, but not what I needed. Unless I missed something...

...

3. basic_error_output does not seem to need the Policy template parameter, is this left for consistency or am I wrong that it is not needed?

It is needed because it has static state: template<class Policy, typename Stream = detail::ostderr> struct basic_error_output : virtual error_output { static Stream error_stream; }; My policy's error_stream is not your policy's error_stream ;-)

...

4. dynamic_cast_ref docs <https://github.com/jll63/Boost.OpenMethod/blob/ea7dbe86b511797f0281ef07b3ada645fe569328/doc/custom_rtti.adoc> mention that D must match the reference category of B. Would it be possible to static_assert that?

Probably this is a better place, in core.hpp: template<class Policy, class D, class B> auto optimal_cast(B&& obj) -> decltype(auto) { if constexpr (requires_dynamic_cast<B, D>) { return Policy::template dynamic_cast_ref<D>(std::forward<B>(obj)); } else { return static_cast<D>(obj); } } Enforce the contract at the source. Good suggestion though... On Thu, May 1, 2025 at 6:33 PM Ivan Matek via Boost <boost@lists.boost.org> wrote:

...

Hi everyone, few questions:

1. fast_perfect_hash achieves performance(load factor) I would assume is impossible, based on perfect hashing talk <https://www.youtube.com/watch?v=yOo6GnbKzp8>. I presume reason why it works so well that compiler places type info data clustered in same region of addresses... Is that correct? 2. regarding static_list: would Boost.Intrusive list work here also, and if so was this considered? 3. basic_error_output does not seem to need the Policy template parameter, is this left for consistency or am I wrong that it is not needed? 4. dynamic_cast_ref docs <https://github.com/jll63/Boost.OpenMethod/blob/ea7dbe86b511797f0281ef07b3ada645fe569328/doc/custom_rtti.adoc> mention that D must match the reference category of B. Would it be possible to static_assert that?

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ivan Matek

1:05 a.m.

On Fri, May 2, 2025 at 2:10 AM Jean-Louis Leroy via Boost < boost@lists.boost.org> wrote:

...

Maybe Joaquín wants to jump in, I don't want to put words in his mouth ;-)

Would be nice, from what I got from his talk some performance I have seen in my experiments, e.g. finding perfect hash for 368 items in 512 buckets seems almost impossible when we apply formula from the talk.

...

...
2. regarding static_list: would Boost.Intrusive list work here also, and if so was this considered?

Yes. static_list needs to work in a special context: static construction time.I.e., no guaranteed construction order across translation units. Its default constructor does nothing. It relies on zero-initialization.

I remember that Boost.Intrusive lists had customization points, but not what I needed. Unless I missed something...

I see, thank you for explaining. If somebody knows more about Boost.Intrusive maybe they can provide more details.

...

...
3. basic_error_output does not seem to need the Policy template parameter, is this left for consistency or am I wrong that it is not needed?

It is needed because it has static state:

template<class Policy, typename Stream = detail::ostderr> struct basic_error_output : virtual error_output { static Stream error_stream; };

My policy's error_stream is not your policy's error_stream ;-)

So this is to enable for example my_policy1 to log to cerr, while my_policy2 logs to file?

...

Enforce the contract at the source. Good suggestion though...

Thank you for adding https://github.com/jll63/Boost.OpenMethod/issues/4 I presume it will be easier for people to follow review, although this is a tiny item. One more question: why not use inline variable here, afaik they are C++17 so it should work. namespace boost::openmethod::policies { template<class Policy, typename Stream = detail::ostderr> struct basic_error_output : virtual error_output { static Stream error_stream; }; template<class Policy, typename Stream> Stream basic_error_output<Policy, Stream>::error_stream; } // namespace boost::openmethod::policies

Joaquin M López Muñoz

4:31 p.m.

El 02/05/2025 a las 3:05, Ivan Matek via Boost escribió:

...

On Fri, May 2, 2025 at 2:10 AM Jean-Louis Leroy via Boost < boost@lists.boost.org> wrote:

...
Maybe Joaquín wants to jump in, I don't want to put words in his mouth ;-)

Would be nice, from what I got from his talk some performance I have seen in my experiments, e.g. finding perfect hash for 368 items in 512 buckets seems almost impossible when we apply formula from the talk.

Ok, boost::openmethod::policies::fast_perfect_hash::hash_initialize(first, last) takes a range of N class info objects, calculates M as the lowest shift so that 1<<M >= N and then randomly tries multipliers C until one of them is such that (type_id*C)>>(64-M) produces no collisions, that is, it acts as perfect hash for the type_ids in [first, last) into 2^M buckets. 100,000 random multipliers are tried, and if all fail then M is bumped by 1, 2, 3, and if nothing works the function bails out. For N = 368, the values of M tried are 9, 10, 11 and 12, which correspond to 512, 1024, 2048 and 4096 buckets, respectively. Now, according to the formulae from the talk you mention, the average number of random tries necessary to find a perfect multiplier (assuming the input is random) is, for each number of buckets: 512 buckets: 1.63840297782e+80 tries 1024 buckets: 7.17526892837e+32 tries 2048 buckets: 1.83078061406e+15 tries 4096 buckets: 24222060 tries all of them much greater than the 100,000 limit for each value of M imposed by hash_initialize. So, as you point out, registration of 368 classes should fail with extremely high probability, right? I wrote this small program that mocks up the registration of 368 classes and calls fast_perfect_hash::hash_initialize: #include <array> #include <boost/openmethod.hpp> #include <boost/unordered/unordered_flat_set.hpp> #include <iostream> using namespace boost::openmethod; struct fake_class { fake_class(type_id id) { type_ids[0] = id; } auto type_id_begin() { return &type_ids[0]; } auto type_id_end() { return &type_ids[1]; } type_id type_ids[1]; }; static constexpr std::size_t num_classes = 368; int main() { std::vector<fake_class> data; boost::unordered_flat_set<type_id> unique; std::default_random_engine rnd(13081963); std::uniform_int_distribution<type_id> d; for(int i=0;i<num_classes;++i) { for(;;) { auto id = d(rnd); if(unique.insert(id).second) { data.push_back(id); break; } } } using fast_perfect_hash= policies::fast_perfect_hash<BOOST_OPENMETHOD_DEFAULT_POLICY>; fast_perfect_hash::hash_initialize(data.begin(), data.end()); // locally tweaked fast_perfect_hash so that these members are public std::cout<<"hash_mult: "<<fast_perfect_hash::hash_mult<<"\n"; std::cout<<"M: "<<(sizeof(std::size_t) * CHAR_BIT - fast_perfect_hash::hash_shift)<<"\n"; } As type_ids we simply use random numbers, and, sure enough, the program fails: could not find hash factors after 400000s using 8192 buckets D:\openmethods\x64\Debug\openmethods.exe (process 24820) exited with code 3. Now, let's change the uniform distribution (d) as follows: std::uniform_int_distribution<type_id> d(1, 1000); and, with this change, registration succeeds for 1024 buckets! hash_mult: 16501117525188094459 M: 10 D:\openmethods\x64\Debug\openmethods.exe (process 24332) exited with code 0. What's hapenning? In the second run, all the type_ids are contained in a range that is much smaller than the entire space of 64-bit numbers, so it is *much much* more likely that a random multiplier will be perfect. In real life, type_ids are just the addresses of typeid(X) for each registered class X, and as it happens these usually occupy a contiguous region of memory instead of being scattered across the 2^64 bytes of the memory space. To sum up, fast_perfect_hash::hash_initialize works against theoretical expectation because type_info objects in a program occupy a small, contiguous range of addresses within the memory space. Joaquin M Lopez Munoz

Ivan Matek

5:05 p.m.

On Fri, May 2, 2025 at 6:31 PM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:

...

El 02/05/2025 a las 3:05, Ivan Matek via Boost escribió:

...
On Fri, May 2, 2025 at 2:10 AM Jean-Louis Leroy via Boost < boost@lists.boost.org> wrote:

...
Maybe Joaquín wants to jump in, I don't want to put words in his mouth ;-)

Would be nice, from what I got from his talk some performance I have seen in my experiments, e.g. finding perfect hash for 368 items in 512 buckets seems almost impossible when we apply formula from the talk.

To sum up, fast_perfect_hash::hash_initialize works against theoretical expectation because type_info objects in a program occupy a small, contiguous range of addresses within the memory space.

Joaquin M Lopez Munoz

Thank you, this matches what I have observed, but my knowledge of math was not good enough to go from fact that those values are similar and spaced evenly, e.g. one run on my machine: type_id: 0x445878 type_id: 0x4458a0 type_id: 0x4458c8 type_id: 0x4458f0 type_id: 0x445918 type_id: 0x445940 type_id: 0x445968 type_id: 0x445990 type_id: 0x4459b8 type_id: 0x4459e0 type_id: 0x445a08 type_id: 0x445a30 type_id: 0x445a58 ... to the fact that makes them easier to perfectly hash. Intuitively makes sense, but I could not find anything definitive. Thank you for explaining this and confirming my assumption.

Jean-Louis Leroy

11:02 p.m.

Thank you for the analysis Joaquin. I am beginning to remember...I wrote that part in 2018...My reasoning was that there would be relatively few type_infos. Also they would be confined to the initialized data segment. I could see that they looked consecutive, but I didn't want to count on that. Also, in presence of shared libraries, I would expect multiple disjoint ranges of type_infos. Now that I have customization points, I can revisit the idea, e.g. a facet that uses P(X) = (X - min(Xi)) / sizeof(type_info). One of the motivations for the customization points was my concern that, some day, someone would come up with a set of values for which the factors could not be found in a reasonable amount of time. Then we could fall back on vptr_map. Used in conjunction with virtual_ptr, the cost of the lookup can be amortized over many calls. I have a RDTSC-based benchmark that shows that the performance hit would be small, if benchmarks can be believed. They compare the cost of a method call with one virtual argument with a virtual function call, using different vptr acquisition strategies: https://gist.github.com/jll63/c06e33b4dba3702839c2454edc09a958

Age (days ago)

100

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Ivan Matek
Jean-Louis Leroy
Joaquin M López Muñoz