Re: [boost] [Bloom] Some questions

18 May 2025

      Had a bit more time to think :) so here are my replies and few more
questions.

On Sun, May 18, 2025 at 11:56 AM Joaquin M López Muñoz via Boost <
boost@lists.boost.org> wrote:
...
El 17/05/2025 a las 20:36, Ivan Matek via Boost escribió:
...
Hi everybody,
I was reading the documentation/code, and I have some questions/comments.
1. Why is unit of measurement bit? I understand that conceptually
bloom
    filters(maybe others, did not check) are bit arrays, but for me this
seems
    like very error prone. Would verbose alternative of using
    boost::bloom::bits(1000) or boost::bloom::bytes(125) be preferred? I
did
    not suggest UDL, as I feel for such rarely used class it may be
overkill.
I'm happy to do whatever the communitty decides is best here.
I am a strong fan of strong types(e.g. std::chrono) so bits/bytes strong
type has my vote, not sure what community in general prefers.
...
...
5. Why is BOOST_ASSERT(fpr>=0.0&&fpr<=1.0); not
BOOST_ASSERT(fpr>0.0&&fpr<=1.0);
    , i.e. is there benefit of allowing calls with impossible fpr
argument?
fpr==0.0 is a legitimate (if uninteresting) argument value for
capacity_for:
capacity_for(0, 0.0) --> 24
capacity_for(1, 0.0) --> 18446744073709549592
The formal reason why fpr==0.0 is supported is because of symmetry:
some calls to fpr_for actually return 0.0 (for instance, fpr_for(0, 100)).
This is a bit philosophical, but I actually do not feel this is correct.
First of all  is (0, 0.0) only usecase where fpr of 0.0 makes sense? i.e
any time when n>0 fpr 0.0 is impossible(or I misunderstood something).
So assert could be implies(it is funny because we had discussion about
implies on ML few months ago), so something like:
BOOST_IMPLICATION(fpr == 0.0, n == 0);

Similarly for (1, 0.0) I do not believe result should be size_t max value,
as this is not correct value. Now we both know you will OOM before noticing
this in reality,
but even if we imagine magical computer that can allocate that much memory
fpr is not 0.0.
If this library was not C++11 I would suggest std::optional as return type,
but as it is boost::optional seems like best option.
Now I know people might think I am making API ugly for users, but I really
do not think a bit more typing is such a big problem when
the alternative is people messing up(probably not when using constants in
code, more likely when they dynamically compute something). Bloom filters
are important, but they are not like json or protobuf where they are
everywhere in codebase, users needing to use a bit uglier
API in 10LOC in 1M LOC codebase does not seem like a big deal to me.

So in examples above:
capacity_for(0, 0.0) - > min_possible_capacity
capacity_for(1, 0.0) - > nullopt

Few more questions I thought of recently, apologies in advance if I
misunderstood something.

   1. There is some SIMD code that seems to have cost in K that is
   proportional to "greater or equal multiple of 8". I wonder does that affect
   recommendation for optimal K or not?  e.g  if table we disused before tells
   me to use K 14 should I bump it up to 16 if using fast_multiblock64?
    static BOOST_FORCEINLINE bool check(const value_type& x,boost::uint64_t
   hash)
     {
       for(int i=0;i<k/8;++i){
         if(!check_m256ix2(x[i],hash,8))return false;
         hash=detail::mulx64(hash);
       }
       if(k%8){
         if(!check_m256ix2(x[k/8],hash,k%8))return false;
       }
       return true;
     }
   2. Fact that bloom filter has a break in lookup(you can break as soon as
   one bit is not set(let's ignore simd now) makes me wonder if performance
   depends on hit rate? E.g. If I know my hit rate will be ~10% or ~90% should
   that affect my picking of K?
   3. I have just noticed I wrongly remembered that g++ defaults to
   march=native, it does not. Why is this related to this library? Well I am
   not sure fallback in fast_ headers should exist. In other words when I want
   to use fast_ and AVX2 is not activated I would prefer code to fail to
   compile. So I would remove this fallback:
   template<std::size_t K>
   using fast_multiblock64=multiblock<boost::uint64_t,K>;
   Now there is reasonable argument that people do not want their code to
   break when they change compiler flags, or target new architecture(e.g. one
   where you did not add SIMD for), but I believe SIMD speedup is so huge that
   this should not be a silent change. If users really want fallback behavior
   they can do their own preprocesor + using logic.
   4. Another thing related to SIMD is question of ones arrays used in
   make_m256ix2/make_m256i. Initially I have assumed they are not static
   const(guaranteed construction just once, but thread safe init overhead)
   because compiler is always smart enough to optimize this to some readonly
   memory that will be initialized at compile time and everything is great. I
   looked into asm of my simple program test I wrote on my machine,
   initialization is done at compile time, great.
   But :) then I went to godbolt, and made small reproduction of code, and
   there I can not <https://godbolt.org/z/sYfc7rffa> get the compilers to
   avoid populating that array at runtime.   Now I do not expect for you to
   go over godbolt link and try to nudge compilers to optimize this by
   changing flags or example code, my question is more about if this is
   fragile is that maybe a reason to use some ugly trick to initialize those
   values at compile time(e.g. as constexpr array of properly aligned uint32
   that when interpreted as ones has correct values). If anybody reading
   this has any idea why this happens please do tell. My first guess is that I
   messed up when porting code to small godbolt example. :) My second guess
   would be that in bloom code I extracted for godbolt is used in a way that
   hints to compiler to do the initialization at compile time(e.g. kp
   values are known at compile time), but this is just a guess. So to be clear
   there is no problem I can detect when using bloom library, I am just
   worried that with a huge combinations of compilers, standards, optimization
   levels(e.g. O2 vs O3) and architectures there may be a subset of users that
   will hit performance issue if bloom does not force compile time init of
   those values.

To be more specific, for this source:

int main()
{
  using simd_filter = boost::bloom::filter<std::string_view, 1,
boost::bloom::fast_multiblock64<2>, 1>;
  simd_filter sf;
  char str[2] = {char('a'+(rand()%26)), '\0'};
  sf.insert(str);
  str[0] = char('a'+(rand()%26));
  return sf.may_contain(str);
}

on my machine clang generates beautiful asm in a sense there is no
initialization of ones at runtime. When I copy godbolt example to my
machine initialization of ones at runtime exists, so it is probably not
just godbolt/my machine difference.