
El 20/05/2025 a las 21:07, Ivan Matek escribió:
[...]
One more question: I have some handcrafted tests (where bloom filter is so small it fits in L1/L2 cache, and hit rate of lookups is 0%(beside false positives) ) and simd one is a bit slower than no simd for certain values of K. constexpr size_t num_inserted =10'000; constexpr double fpr =1e-5; constexpr size_t K =5; using vanilla_filter = boost::bloom::filter<uint64_t,1, boost::bloom::multiblock<uint64_t, K>,1>; using simd_filter = boost::bloom::filter<uint64_t,1, boost::bloom::fast_multiblock64<K>,1>; I presume that is expected since it is hard to make sure SIMD is always faster, but just wanted to double check with you that this is not a unexpected result. So to recap my question: If bloom filter fits in L1 or L2 cache is it best practice to check if SIMD or normal version is faster instead of assuming SIMD always wins?
Benchmarks at https://github.com/joaquintides/boost_bloom_benchmarks show that the advantage of fast_multiblock64<K> with respect to multiblock<uint64_t, K> is small for some compilers (Clang, VS) and low values of K, and occasionally multiblock wins (though these measurements come with a fair degree of noise). So, yes, I'd profile to make sure. In the case of fast_multiblock32<K> vs. multiblock<uint64_t, K>, the advantage of the former is much more clear (note that multiblock<uint32_t, K> is not included in the benchmarks because it does not get us anything with respect to multiblock<uint64_t, K>). Joaquin M Lopez Munoz