[boost] [bloom] Andrzej's feedback

18 May 2025

      Hi Everyone,
The following can hardly be called a library review, but I still wanted to
give this feedback.
The documentation of this library meets what I call the highest standards
of Boost. What attracted me to Boost originally was the way the libraries
were documented. Studying each new library was learning something new:
about programming, about structuring your program, about structuring your
thoughts. A lot of libraries, back then, first documented the problem
domain, then gave you some theory behind it, then listed possible
approaches, and then finally also presented to you the chosen solution.

Boost.Bloom represents the same character. I did not know what Bloom
Filters were before reading the docs. Now I feel I do. I learned about the
problem domain. I was convinced that the problem domain is worth
addressing. I know the technical challenges. I know what to expect of the
library interface (without having looked at the reference section)!
I can already answer one of the review questions. Is the design of the
library sound? If the author was able to explain the problem domain and the
technical challenges/trade-offs so clearly and succinctly, then I am
confident that the library design must be sound.

Some interface suggestions:

1. Use the term "bit_capacity" rather than "capacity" for representing
capacity in bits. This is not to accidently mislead the users.

2. It is a good move to use the name "may_contain" rather than "contains",
but "may_contain" actually also doesn't do justice to the semantics either.
The user is asking, "do you contain the value", and the library is
replying, "maybe I do, and maybe I don't". Actually, I am unable to suggest
anything better. If the function name was `f.definately_misses()` then a
true/false answer would be meaningful. and the opposite of
"definately_misses" is "either 'contains', or 'misses' incorrectly
classified as 'contains'". But I realize that it is longer and contains
negatives. So I think your choice of the name is the best given the set of
alternatives.

3. When describing the |= operator, use the name "set union", which is the
industry standard. The current choice of words "containing both the
elements of f and f2" can be interpreted both as a union and as an
intersection.

4. I am uneasy about the functionality under &= operator. It implies
symmetry with operator |=. But the symmetry is broken for at least two
reasons:
* If I put some elements to filter f1 and some other elements to filter f2,
and then I apply |=, then the effect is the same as if I put all the
elements in f1 to begin with. The same property, if I understand correctly,
does not apply to operator &=.
* The compromising of the FPR value that you are describing.
I am not sure if anyone ever needs this intersection behavior. But if it is
useful, maybe reflect this lack of symmetry by using a named function
rather than an operator?

5. Does the suffix "_for" in the name "fpr_for" bring any value?

6. Not sure if it is a good idea to have two overloads for `reset()` doing
slightly different things. For constructors this is unavoidable, but for
member functions, you can have `reset` and `reset_for`.

7. I think that you are missing a precondition for operators &= and |=.
They work under the assumptions that for both filters f1 and f2 inserting
the same element renders the same bit pattern in the hash. But this is not
the case when the hashers that I provide to f1 and f2 have the same type
but different initial state:

filter<T, MyHash> f1(1000, MyHash{1});
filter<T, MyHash> f2(1000, MyHash{2});

The above feedback is based on me having read the documentation. I haven't
tried the library, and it is unlikely that I will be able to do this within
the review period, so I am not sure if I am qualified to give a
recommendation for accepting the library into Boost.

Joaquín, thank you for writing and sharing this library!

Regards,
&rzej;

[boost] [bloom] Andrzej's feedback

Andrzej Krzemienski