
On Thu, May 22, 2025 at 10:09 AM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:
In fact I'm somewhat surprised most reviewers haven't asked for variations of/alternatives to Bloom filters. This can be added to the roadmap if the library gains traction, I guess.
Ideally, variations would have motivating use-cases so their performance could be studied in a real-world setting.
The example in the README could be improved by declaring the filter thusly:
using filter = boost::bloom::filter<boost::core::string_view, 5>;
I don't think it is a good idea to give users an example which performs unnecessary copying or allocation.
This is not needed cause the filter (with the exception of emplace) does not create elements on its own --you comment on this later in the review.
It does create the element temporarily. Here, we are constructing a `std::string` from `char const*`: https://github.com/joaquintides/bloom/blob/fabc8698c1053efdfe5336de2d01a4fb3... This is only because the filter type uses `std::string`. I realize in the example that these probably qualify for std::string's small buffer optimization, and users might be surprised when "inserting" larger strings that dynamic allocations take place.
I'm puzzled about the terminology. "insert" and "emplace" imply that new nodes or array elements of type value_type are being stored. This is not the case though, as these objects are hashed and the resulting digest is used to algorithmically adjust the filter which is in essence an array of bool. `emplace` documentation says that it constructs elements.
The interface mimics that of a container as a way to lower the entry barrier for new users learning it. But filters are _not_ containers, as you've found out. Claudio also mentioned this. I'll stress the point in some prominent place in the documentation.
I have the opposite opinion. That the interface should distance itself from a typical container interface, otherwise new users may mistakenly believe that they behave similarly in terms of resource usage. At least when it comes to strings, proper selection of the template parameter type seems to be a foot-gun. Maybe there is room for a type alias "string_filter" which is tuned for character strings. BTW, emplace() does construct the element, calculates its hash, and then
throws it away.
Yeah, this is weird. Why even have the element type? `insert` can in theory accept any object which is hashable. And emplace doesn't seem as valuable for the bloom filter container as it is for regular containers, since there are no nodes. That is, rather than `emplace(args...)` a user could just as easily write `insert(T(args...))`
I think that the array() function(s) need to guarantee that bits beyond the capacity but present in the storage will be zeroed out. This way they can be serialized deterministically.
Not sure I'm getting you, but the extent of the span returned by array() is exactly as wide as capacity() indicates.
What if I construct the filter with say, a prime number of bits like 137? What happens to the last 7 bits (to bring it up to 144)? Thanks