
El 22/05/2025 a las 16:29, Vinnie Falco via Boost escribió:
[...]
Documentation is a mixed bag. I can't seem to find very important things like the minimum version of C++ required, or which compilers it supports, like I can find here:
https://www.boost.org/doc/libs/latest/libs/json/doc/html/json/overview.html#...
Peter Turcan also asked for a getting started section along these lines. I will add it as a subsection to the introduction.
The content of the documentation is high quality but the layout is in my opinion terrible. I find it hard to read these class synopses and the styling is an impediment to legibility. This is not a problem specific to Boost.Bloom. Hopefully some of the work the C++ Alliance is doing in the area of documentation will empower authors to do better.
My plan is, if the library is accepted, to ask for help to port this to Antora (aka Boost.Unordered style).
The inclusion of the Visual Studio visualizer ("natvis file") is a very nice touch and an indicator of quality. Hopefully we will see the corresponding gdb pretty printer. Or perhaps someone might contribute it.
I already have in mind who I'll ask to do this :-)
Heads up, I am quite familiar with Bloom filters in terms of implementation experience although Joaquin is way ahead of me on the math and optimization. Bloom filters are very common in decentralized network programs such as Gnutella, BearShare, LimeWire, and now bitcoind. For example, thin bitcoind nodes can provide their directly connected peers with a compressed bloom filter, to reduce broadcast message volume to the subset of interested transactions:
https://bitcoinops.org/en/topics/transaction-bloom-filtering
Before the review period started, I had mentioned at least twice that it would be nice to see a demo of Boost.Bloom data structures patched into some other popular software which already uses these filters. A quick review of the repository turns up these leads which could provide for interesting study:
https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34... https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34... https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...
The "rolling bloom filter" referenced above is a slightly different data structure from what Boost.Bloom offers, and I can't help but wonder if this "CRollingBloomFilter" can be implemented in terms of Boost.Bloom. If the Boost.Bloom powered version of bitcoind proves more efficient than its current implementation, it will be adopted by that project as such efficiencies contribute to increased decentralization.
In fact I'm somewhat surprised most reviewers haven't asked for variations of/alternatives to Bloom filters. This can be added to the roadmap if the library gains traction, I guess.
The example in the README could be improved by declaring the filter thusly:
using filter = boost::bloom::filter<boost::core::string_view, 5>;
I don't think it is a good idea to give users an example which performs unnecessary copying or allocation.
This is not needed cause the filter (with the exception of emplace) does not create elements on its own --you comment on this later in the review.
I'm puzzled about the terminology. "insert" and "emplace" imply that new nodes or array elements of type value_type are being stored. This is not the case though, as these objects are hashed and the resulting digest is used to algorithmically adjust the filter which is in essence an array of bool. `emplace` documentation says that it constructs elements.
The interface mimics that of a container as a way to lower the entry barrier for new users learning it. But filters are _not_ containers, as you've found out. Claudio also mentioned this. I'll stress the point in some prominent place in the documentation. BTW, emplace() does construct the element, calculates its hash, and then throws it away.
This talk of subfilters is confusing and sounds like an implementation detail. And I trust that whatever is there is necessary, as Joaquin is not prone to frivolities.
I'll expand on this in the docs. Subfilters are the way the library allows users to configure the filter according to different FPR/speed/size tradeoffs. Ruben Perez and Ivan Matek also requested more clarity in this area.
I think that the array() function(s) need to guarantee that bits beyond the capacity but present in the storage will be zeroed out. This way they can be serialized deterministically.
Not sure I'm getting you, but the extent of the span returned by array() is exactly as wide as capacity() indicates.
I looked through the implementation, the optimizations are a bit beyond me although it appears to be well thought out and the benchmarks are nice. I ran the examples and I set some breakpoints and stepped into the code, where I understood that insert and emplace do not store the elements yet we are constructing them anyway which is disappointing for a string_view aficionado such as myself.
I suppose this review can mostly be considered feedback and not any meaningful opposition to its inclusion in Boost, therefore my vote is to ACCEPT this library. Joaquin has a track record of perpetual improvements to the libraries he works on so even if there is an identified shortcoming, I doubt it will last long. I appreciate the work that went into the library and I think it will do well in its domain.
Thanks for your review Vinnie! Joaquin M Lopez Munoz