
Hi Everyone, The following can hardly be called a library review, but I still wanted to give this feedback. The documentation of this library meets what I call the highest standards of Boost. What attracted me to Boost originally was the way the libraries were documented. Studying each new library was learning something new: about programming, about structuring your program, about structuring your thoughts. A lot of libraries, back then, first documented the problem domain, then gave you some theory behind it, then listed possible approaches, and then finally also presented to you the chosen solution. Boost.Bloom represents the same character. I did not know what Bloom Filters were before reading the docs. Now I feel I do. I learned about the problem domain. I was convinced that the problem domain is worth addressing. I know the technical challenges. I know what to expect of the library interface (without having looked at the reference section)! I can already answer one of the review questions. Is the design of the library sound? If the author was able to explain the problem domain and the technical challenges/trade-offs so clearly and succinctly, then I am confident that the library design must be sound. Some interface suggestions: 1. Use the term "bit_capacity" rather than "capacity" for representing capacity in bits. This is not to accidently mislead the users. 2. It is a good move to use the name "may_contain" rather than "contains", but "may_contain" actually also doesn't do justice to the semantics either. The user is asking, "do you contain the value", and the library is replying, "maybe I do, and maybe I don't". Actually, I am unable to suggest anything better. If the function name was `f.definately_misses()` then a true/false answer would be meaningful. and the opposite of "definately_misses" is "either 'contains', or 'misses' incorrectly classified as 'contains'". But I realize that it is longer and contains negatives. So I think your choice of the name is the best given the set of alternatives. 3. When describing the |= operator, use the name "set union", which is the industry standard. The current choice of words "containing both the elements of f and f2" can be interpreted both as a union and as an intersection. 4. I am uneasy about the functionality under &= operator. It implies symmetry with operator |=. But the symmetry is broken for at least two reasons: * If I put some elements to filter f1 and some other elements to filter f2, and then I apply |=, then the effect is the same as if I put all the elements in f1 to begin with. The same property, if I understand correctly, does not apply to operator &=. * The compromising of the FPR value that you are describing. I am not sure if anyone ever needs this intersection behavior. But if it is useful, maybe reflect this lack of symmetry by using a named function rather than an operator? 5. Does the suffix "_for" in the name "fpr_for" bring any value? 6. Not sure if it is a good idea to have two overloads for `reset()` doing slightly different things. For constructors this is unavoidable, but for member functions, you can have `reset` and `reset_for`. 7. I think that you are missing a precondition for operators &= and |=. They work under the assumptions that for both filters f1 and f2 inserting the same element renders the same bit pattern in the hash. But this is not the case when the hashers that I provide to f1 and f2 have the same type but different initial state: filter<T, MyHash> f1(1000, MyHash{1}); filter<T, MyHash> f2(1000, MyHash{2}); The above feedback is based on me having read the documentation. I haven't tried the library, and it is unlikely that I will be able to do this within the review period, so I am not sure if I am qualified to give a recommendation for accepting the library into Boost. JoaquĆn, thank you for writing and sharing this library! Regards, &rzej;