Re: [boost] [Bloom] Some questions

20 May 2025

      On Tue, May 20, 2025 at 6:31 PM Joaquin M López Muñoz via Boost <
boost@lists.boost.org> wrote:
...
...
User might think: my CPU supports AVX2, so surely it will use SIMD
algorithms. But available here refers to compiler options(and
obviously CPU support when binary is started), not just on CPU
support. I know I am not telling you anything you do not know, I just
think large percentage of users might misunderstand what available means.
Yes, you're right, I can rewite "are available" as "are enabled at compile
time".
Thank you, I believe that is big improvement.
...
Umm, yes, maybe. Anyway, scratch what I said about compilers
not really caring about const vs. static const: adding static to your
snippet severely pessimizes the codegen, with static initialization
guards and all. So there goes your explanation to why static
was not used :-)
Yes, I have noticed static messes it up, although for ints
<https://godbolt.org/z/oYM15zYoW> compiler is smart enough to not emit that
guard. That is one of reasons why I am so paranoid this optimization might
stop working with some future compiler.
simd intrisics may be harder for compiler to reason about that "just" ints.
...
For the record, during develpment I examined
the gencode for all fast_multiblockXX classes with the three
major compilers, Intel and ARM to check that nothing looked bad.
I agree that 99% it will never break, since I presume compilers will rarely
regress in this manner... but I still think there is tiny chance they
might. :)

One more question:
I have some handcrafted tests (where bloom filter is so small it fits in
L1/L2 cache, and hit rate of lookups is 0%(beside false positives) ) and
simd one is a bit slower than no simd for certain values of K.

constexpr size_t num_inserted = 10'000;
constexpr double fpr = 1e-5;
constexpr size_t K = 5;
using vanilla_filter = boost::bloom::filter<uint64_t, 1,
boost::bloom::multiblock<uint64_t, K>, 1>;
using simd_filter = boost::bloom::filter<uint64_t, 1,
boost::bloom::fast_multiblock64<K>, 1>;

I presume that is expected since it is hard to make sure SIMD is always
faster, but just wanted to double check with you that this is not a
unexpected result.
So to recap my question: If bloom filter fits in L1 or L2 cache is it best
practice to check if SIMD or normal version is faster instead of assuming
SIMD always wins?

Re: [boost] [Bloom] Some questions

Ivan Matek