Re: [boost] [Bloom] Some questions

20 May 2025

      El 18/05/2025 a las 23:38, Ivan Matek escribió:
...
[...]
Few more questions I thought of recently, apologies in advance if I 
misunderstood something.
1. There is some SIMD code that seems to have cost in K that is
    proportional to "greater or equal multiple of 8". I wonder does
    that affect recommendation for optimal K or not? e.g  if table we
    disused before tells me to use K 14 should I bump it up to 16 if
    using fast_multiblock64?
     static BOOST_FORCEINLINE bool check(const value_type&
    x,boost::uint64_t hash)
      {
        for(int i=0;i<k/8;++i){
    if(!check_m256ix2(x[i],hash,8))return false;
          hash=detail::mulx64(hash);
        }
    if(k%8){
    if(!check_m256ix2(x[k/8],hash,k%8))return false;
        }
        return true;
      }
This k%8 check doesn't really happen at run time, as k is a compile-time
value, so k=14 should be (slightly) faster than k=16. So, I don't 
particulary
recommend that a multiple of 8 be used if that's not optimal in terms of
achieved FPR. That said, you can see in

https://github.com/joaquintides/boost_bloom_benchmarks

that execution times for fast_multiblockXX<K>, although roughly proportional
to K, seem to occasionally favor k%8==0. So, in the end I'm afraid 
you'll have
to measure to come up with the best configuration for your environment.
...
1. Fact that bloom filter has a break in lookup(you can break as soon
    as one bit is not set(let's ignore simd now) makes me wonder if
    performance depends on hit rate? E.g. If I know my hit rate will
    be ~10% or ~90% should that affect my picking of K?
block and multiblock do not do early termination on lookup,
while fast_multiblockXX do. The reason for that is that this
is what was fastest overall in each case.

As for the hit rate, well, I guess if you know most lookups
are going to be negative, then you can maybe use a higher k.
But then, for a given capacity and number of inserted
elements, there's no point in increasing k beyond its optimal
value (that yielding the smallest possible FPR).
...
1. I have just noticed I wrongly remembered that g++ defaults to
    march=native, it does not. Why is this related to this library?
    Well I am not sure fallback in fast_ headers should exist. In
    other words when I want to use fast_ and AVX2 is not activated I
    would prefer code to fail to compile. So I would remove this
    fallback:
    template<std::size_t K>
    using fast_multiblock64=multiblock<boost::uint64_t,K>;
    Now there is reasonable argument that people do not want their
    code to break when they change compiler flags, or target new
    architecture(e.g. one where you did not add SIMD for), but I
    believe SIMD speedup is so huge that this should not be a silent
    change. If users really want fallback behavior they can do their
    own preprocesor + using logic.
That's a matter of opinion, I guess, but I'd rather have people not
wanting the fallback write the compile-time check instead of the
other way around. Sometimes you're not writing a final application
but a library (say, on top of candidate Boost.Bloom), and you don't
control compilation flags or target architecture.
...
1. Another thing related to SIMD is question of onesarrays used in
    make_m256ix2/make_m256i. Initially I have assumed they are not
    static const(guaranteed construction just once, but thread safe
    init overhead) because compiler is always smart enough to optimize
    this to some readonly memory that will be initialized at compile
    time and everything is great. I looked into asm of my simple
    program test I wrote on my machine, initialization is done at
    compile time, great.
    But :) then I went to godbolt, and made small reproduction of
    code, and there I can not <https://godbolt.org/z/sYfc7rffa> get
    the compilers to avoid populating that array at runtime. Now I do
    not expect for you to go over godbolt link and try to nudge
    compilers to optimize this by changing flags or example code, my
    question is more about if this is fragile is that maybe a reason
    to use some ugly trick to initialize those values at compile
    time(e.g. as constexpr array of properly aligned uint32 that when
    interpreted as ones has correct values). If anybody reading this
    has any idea why this happens please do tell. My first guess is
    that I messed up when porting code to small godbolt example. :) My
    second guess would be that in bloom code I extracted for godbolt
    is used in a way that hints to compiler to do the initialization
    at compile time(e.g. kp values are known at compile time), but
    this is just a guess. So to be clear there is no problem I can
    detect when using bloom library, I am just worried that with a
    huge combinations of compilers, standards, optimization
    levels(e.g. O2 vs O3) and architectures there may be a subset of
    users that will hit performance issue if bloom does not force
    compile time init of those values.
To be more specific, for this source:
int main()
{
   using simd_filter = boost::bloom::filter<std::string_view,1, boost::bloom::fast_multiblock64<2>, 1>;
   simd_filter sf;
   char str[2] = {char('a'+(rand()%26)),'\0'};
   sf.insert(str);
   str[0] =char('a'+(rand()%26));
   return sf.may_contain(str);
}
on my machine clang generates beautiful asm in a sense there is no 
initialization of ones at runtime. When I copy godbolt example to my 
machine initialization of ones at runtime exists, so it is probably 
not just godbolt/my machine difference.
Your godbolt snippet behaves differently to candidate Boost.Bloom
because the values of kp are not known at compile time (which is the
case with candidate Boost.Bloom). If you instead do

   std::array vals = {5,6,7};

https://godbolt.org/z/M7W5Tnbsq

then the code becomes noticeably simpler and in one compiler (Clang)
the ones table goes away completely. In general, having the table
marked as static or not makes no difference, compilers are smart
enough to handle that regardless. I fail to see any run-time table
initialization in your original snippet at https://godbolt.org/z/sYfc7rffa .

Joaquin M Lopez Munoz

Re: [boost] [Bloom] Some questions

Joaquin M López Muñoz