
El 18/05/2025 a las 13:29, Andrzej Krzemienski via Boost escribió:
Hi Everyone, The following can hardly be called a library review, but I still wanted to give this feedback.
Hi Andrzej, thanks for this feedback! It'd be great if you can find the time to write a full review.
[...].
Some interface suggestions:
1. Use the term "bit_capacity" rather than "capacity" for representing capacity in bits. This is not to accidently mislead the users.
I find the name ugly but it certainly dispels any potential ambiguity, so, yes, I can adopt that.
[...] 3. When describing the |= operator, use the name "set union", which is the industry standard. The current choice of words "containing both the elements of f and f2" can be interpreted both as a union and as an intersection.
Noted, will do.
4. I am uneasy about the functionality under &= operator. It implies symmetry with operator |=. But the symmetry is broken for at least two reasons: * If I put some elements to filter f1 and some other elements to filter f2, and then I apply |=, then the effect is the same as if I put all the elements in f1 to begin with. The same property, if I understand correctly, does not apply to operator &=.
The following, dual property holds: if I put some elements to f1 and some other elements to f2 and then I apply &=, those elements that were both in f1 and f2 will also be in the result. The symmetry breaks in that this operation is _not_ the same as constructing a filter from scratch with the elements in both f1 and f2, as the latter won't have stray, unnecessary bits set to one that may potentially be left over in the former.
* The compromising of the FPR value that you are describing. I am not sure if anyone ever needs this intersection behavior. But if it is useful, maybe reflect this lack of symmetry by using a named function rather than an operator?
I don't like this much. Admittedly there's a lack of simmetry due to FPR degradation, but &= is so descriptive of what's going on that I'd rather keep it.
5. Does the suffix "_for" in the name "fpr_for" bring any value?
This suffix has been added for consistency reasons. We have capacity() (non-static, capacity of the filter under consideration) and capacity_for(...) (static, capacity estimation, not dependent on the state of any filter). And fpr_for(...) is sort of the inverse function of capacity_for(...), so it makes sense (in my mind at least) that it have the same suffix.
6. Not sure if it is a good idea to have two overloads for `reset()` doing slightly different things. For constructors this is unavoidable, but for member functions, you can have `reset` and `reset_for`.
This usage of "_for()" to unload the reset overload is not consistent with how it's used for capacity() and capacity_for(). I mean, reset() is NOT to reset_for() as capacity() is to capacity_for(); rather both overloads of reset do the same thing with different input parameters.
7. I think that you are missing a precondition for operators &= and |=. They work under the assumptions that for both filters f1 and f2 inserting the same element renders the same bit pattern in the hash. But this is not the case when the hashers that I provide to f1 and f2 have the same type but different initial state:
filter<T, MyHash> f1(1000, MyHash{1}); filter<T, MyHash> f2(1000, MyHash{2});
Well, technically the behavior is well defined even if the hashers are not equivalent: "[...] changes the value of each bit in the internal array with the result of doing a logical AND operation of that bit and the corresponding one in x." Admittedly, doing this with non-equivalent hashers will produce nothing but garbage, so... well, I don't know what I should do here :-)
[...] Joaquín, thank you for writing and sharing this library!
Thank _you_ for taking the time to look into my proposal. Joaquin M Lopez Munoz