
Joaquín, Arnaud, Thanks for submitting and managing the review of this library. Here is my review of the documentation. Good to see plenty of links, tables and structure to the reference, diagrams and equations, and performance benchmarks. I think the doc could use a much friendlier introduction that explains the purpose of these useful filters to all C++ developers. I suggest something like this: *Introduction* Bloom filters provide a pattern-matching mechanism to rapidly determine if a given item - an email, URL, genome, username, digital fingerprint etc. - *might *be a member of a long and important list. A key feature of Bloom filters is that there are no false negatives - if the result of the filter is *negative *then the item does not occur in the list with a mathematical certainty of 100%. But there can be false positives - if the result of the filter is *positive *it means that the item *may *occur in the list but the filter is not certain. The certainty of avoiding false negatives makes Bloom filters particularly useful in fraud detection. This is best explained by an over-simplified example. Say you have a list, a database, of known bad actors, that needs to be checked before access to your financial website is approved. And, for the sake of simplicity the bad actors list contains only: 1. mr-jekyll@mail.com 2. mr-hyde@mail.com 3. frankenstein@mail.com 4. al-capone@mail.com 5. lord-voldemort@mail.com 6. darth-vader@mail.com Now, an attempt is made to access the financial website that is protected by a pattern-matching Bloom filter. In our example, the Bloom filter could simply be: If (request-name contains "e") then return positive, else return negative; If access is requested by T <Tony-Montana@mail.com>ony-Montana@mail.com, the filter will return *negative *- there is no "e" in the name so Tony Montana could not possibly be in the bad actors list and he should be granted access to the financial website without further ado. If access is requested by John-Dillinger@mail.com, then the filter returns *positive *- it contains an "e" so the name *might *be in the bad actors list. The only way to be sure is now to perform a full-text search - time consuming but will provide certainty. The value of Bloom filters is that they can eliminate full-text searches for many access requests. In reality, the list of bad actors is likely to be millions long. The algorithms used in the Bloom filter use mathematical patterns (determined by hash functions) that are present in the list. You can imagine that a long list might require half-a-dozen or more patterns to ensure that ALL entries in the bad actors list are represented by one or more patterns. It does not matter if a bad actor is covered by multiple patterns. It is essential that all bad actors are represented by at least one pattern. The Bloom filter will check an access request against each pattern it is provided with, and return *immediately* when a check against a pattern returns positive. This result initiates a full-text search, though no doubt this can be optimized with clever indexing and organization. Whereas fraud detection is the top use of Bloom filters, they have proved useful in such situations as checking genome strings against a database of DNA sequences. These checks can be time intensive due to the length of DNA strings. Similarly for high-frequency trading, to check for recent or duplicate orders, as speed is of the essence. For the record, the Bloom filter is named after its inventor - Burton Howard Bloom - who described its purpose in a 1970 paper - *Space/Time Trade-offs in Hash Coding with Allowable Errors*. There is no connection between the filters and "blooming" in the floral sense! If an application you are building contains a large database that needs to be searched regularly, this library provides you with the components necessary to create a Bloom filter of the complexity required to fine tune the search performance of your application. Follow the intro with a *Getting Started* section: *Getting Started* - describe the *Requirements*, *Installation *and a trivial "hello world" example of the use of the library. *Tutorial* - this section is "tutorial" in name only - the section should be a numbered step-by-step procedure for the reader to follow - that results in a small but meaningful example working, with the expected output. Currently it is not certain what to do. Following the Tutorial, your *Bloom Filter Primer, Choosing a Filter Combination, Benchmarks *- are all appropriate. *Reference* - I found it difficult to follow the Reference in the sense of "what am I looking at?". Best to be specific as to what is a class, template, method, property, field, macro etc. AND critically explain the use-case of each of these constructs in the intro or description of the construct - what is the scenario that this construct applies to? Some of the reference I didn't really understand - such as: A compile-time std::size_t value indicating the number of (not necessarily distinct) bits set/checked per operation. - what does *not necessarily distinct* mean? *Appendix A* The equations are somewhat daunting and difficult to determine what is good practice and what is not. Perhaps add text to describe best practices in certain situations. Perhaps add an *Acknowledgements *section at the end - authors, testers, motivation, etc. *Summary* I would most like to see a friendly introduction, a step-by-step tutorial, and a clearer reference. The case for Bloom filters seems strong, and good luck with the library. I am the technical writer for the C++ Alliance. - best Peter Turcan On Tue, May 13, 2025 at 2:47 AM Arnaud Becheler via Boost < boost@lists.boost.org> wrote:
Dear Boost community,
The review of the Boost.Bloom library begins today May 13th, 2025, and will run through May 22nd, 2025.
Boost.Bloom is a header-only C++ library written by Joaquín M López Muñoz, providing fast and flexible Bloom filters. These are space-efficient probabilistic data structures used to test set membership with a controllable false positive rate and minimal memory footprint. The library supports a variety of configurations and hash policies and is designed to integrate well with modern C++.
You can find the library here: - Repo: https://github.com/joaquintides/bloom - Documentation: https://master.bloom.cpp.al/
Bloom filters are especially useful when: - You need to test if something is not in a large dataset, and can tolerate a small false positive rate. (e.g., avoiding duplicate URL crawls in a web crawler) - You need a memory-efficient structure to represent large sets (e.g., checking for the presence of DNA subsequences across massive genomic datasets) - You want to avoid expensive lookups for items that are definitely not present (e.g., filtering nonexistent records before querying a remote database).
As always, we welcome all reviews — from quick impressions to detailed analysis. Your feedback helps ensure that the library meets Boost's high standards in terms of correctness, performance, documentation, and design. If you’re interested in contributing a review, please post to the Boost mailing list during the review period.
In your review please state whether you recommend to reject or accept the library into Boost, and whether you suggest any conditions for acceptance. Other questions you might want to answer in your review are: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problems tackled by the library?
Note: All links to other parts of Boost in the Boost.Bloom documentation are currently broken, as they were written with the assumption that the library was already integrated into Boost - please do not account for those in your review.
Thank you in advance for your time and insights!
Best regards, Arnaud Becheler Review Manager, Boost.Bloom
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost