
El 18/05/2025 a las 23:38, Ivan Matek escribió:
Had a bit more time to think :) so here are my replies and few more questions.
> 5. Why is BOOST_ASSERT(fpr>=0.0&&fpr<=1.0); not > BOOST_ASSERT(fpr>0.0&&fpr<=1.0); > , i.e. is there benefit of allowing calls with impossible fpr argument? fpr==0.0 is a legitimate (if uninteresting) argument value for capacity_for:
capacity_for(0, 0.0) --> 24 capacity_for(1, 0.0) --> 18446744073709549592
The formal reason why fpr==0.0 is supported is because of symmetry: some calls to fpr_for actually return 0.0 (for instance, fpr_for(0, 100)).
This is a bit philosophical, but I actually do not feel this is correct. First of all is (0, 0.0) only usecase where fpr of 0.0 makes sense? i.e any time when n>0 fpr 0.0 is impossible(or I misunderstood something).
Yes, it is impossible: the capacity would have to be infinite --the maximum attainable value is returned instead, though this is of little value as OOM would ensue (as you point out below).
So assert could be implies(it is funny because we had discussion about implies on ML few months ago), so something like: BOOST_IMPLICATION(fpr == 0.0, n == 0);
Similarly for (1, 0.0) I do not believe result should be size_t max value, as this is not correct value. Now we both know you will OOM before noticing this in reality, but even if we imagine magical computer that can allocate that much memory fpr is not 0.0.
I understand your point and can relate to it, but consider this: capacity_for(1, 1.E-200) Is this legit? OOM will happen here, too. Where do we put the limit?
If this library was not C++11 I would suggest std::optional as return type, but as it is boost::optional seems like best option. Now I know people might think I am making API ugly for users, but I really do not think a bit more typing is such a big problem when the alternative is people messing up(probably not when using constants in code, more likely when they dynamically compute something). Bloom filters are important, but they are not like json or protobuf where they are everywhere in codebase, users needing to use a bit uglier API in 10LOC in 1M LOC codebase does not seem like a big deal to me.
So in examples above: capacity_for(0, 0.0) - > min_possible_capacity capacity_for(1, 0.0) - > nullopt
Few more questions I thought of recently, apologies in advance if I misunderstood something. [...]
I will address these comments tomorrow (out of time today), but I felt like answering to the first prt of your post now. Joaquin M Lopez Munoz