Re: [boost] [Bloom] review starts on 13th of May

22 May 2025

      El 22/05/2025 a las 16:29, Vinnie Falco via Boost escribió:
...
[...]
Documentation is a mixed bag. I can't seem to find very important things
like the minimum version of C++ required, or which compilers it supports,
like I can find here:
https://www.boost.org/doc/libs/latest/libs/json/doc/html/json/overview.html#...
Peter Turcan also asked for a getting started section along these lines. 
I will add it
as a subsection to the introduction.
...
The content of the documentation is high quality but the layout is in my
opinion terrible. I find it hard to read these class synopses and the
styling is an impediment to legibility.  This is not a problem specific to
Boost.Bloom. Hopefully some of the work the C++ Alliance is doing in the
area of documentation will empower authors to do better.
My plan is, if the library is accepted, to ask for help to port this to 
Antora
(aka Boost.Unordered style).
...
The inclusion of the Visual Studio visualizer ("natvis file") is a very
nice touch and an indicator of quality. Hopefully we will see the
corresponding gdb pretty printer. Or perhaps someone might contribute it.
I already have in mind who I'll ask to do this :-)
...
Heads up, I am quite familiar with Bloom filters in terms of implementation
experience although Joaquin is way ahead of me on the math and
optimization. Bloom filters are very common in decentralized network
programs such as Gnutella, BearShare, LimeWire, and now bitcoind. For
example, thin bitcoind nodes can provide their directly connected peers
with a compressed bloom filter, to reduce broadcast message volume to the
subset of interested transactions:
https://bitcoinops.org/en/topics/transaction-bloom-filtering
Before the review period started, I had mentioned at least twice that it
would be nice to see a demo of Boost.Bloom data structures patched into
some other popular software which already uses these filters.  A quick
review of the repository turns up these leads which could provide for
interesting study:
https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...
https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...
https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...
The "rolling bloom filter" referenced above is a slightly different data
structure from what Boost.Bloom offers, and I can't help but wonder if this
"CRollingBloomFilter" can be implemented in terms of Boost.Bloom. If the
Boost.Bloom powered version of bitcoind proves more efficient than its
current implementation, it will be adopted by that project as such
efficiencies contribute to increased decentralization.
In fact I'm somewhat surprised most reviewers haven't asked for variations
of/alternatives to Bloom filters. This can be added to the roadmap if 
the library
gains traction, I guess.
...
The example in the README could be improved by declaring the filter thusly:
using filter = boost::bloom::filter<boost::core::string_view, 5>;
I don't think it is a good idea to give users an example which performs
unnecessary copying or allocation.
This is not needed cause the filter (with the exception of emplace) does 
not create
elements on its own --you comment on this later in the review.
...
I'm puzzled about the terminology. "insert" and "emplace" imply that new
nodes or array elements of type value_type are being stored. This is not
the case though, as these objects are hashed and the resulting digest is
used to algorithmically adjust the filter which is in essence an array of
bool. `emplace` documentation says that it constructs elements.
The interface mimics that of a container as a way to lower the entry barrier
for new users learning it. But filters are _not_ containers, as you've found
out. Claudio also mentioned this. I'll stress the point in some 
prominent place in
the documentation.

BTW, emplace() does construct the element, calculates its hash, and then 
throws
it away.
...
This talk
of subfilters is confusing and sounds like an implementation detail. And I
trust that whatever is there is necessary, as Joaquin is not prone to
frivolities.
I'll expand on this in the docs. Subfilters are the way the library 
allows users
to configure the filter according to different FPR/speed/size tradeoffs. 
Ruben
Perez and Ivan Matek also requested more clarity in this area.
...
I think that the array() function(s) need to guarantee that bits beyond the
capacity but present in the storage will be zeroed out. This way they can
be serialized deterministically.
Not sure I'm getting you, but the extent of the span returned by array() 
is exactly
as wide as capacity() indicates.
...
I looked through the implementation, the optimizations are a bit beyond me
although it appears to be well thought out and the benchmarks are nice. I
ran the examples and I set some breakpoints and stepped into the code,
where I understood that insert and emplace do not store the elements yet we
are constructing them anyway which is disappointing for a string_view
aficionado such as myself.
I suppose this review can mostly be considered feedback and not any
meaningful opposition to its inclusion in Boost, therefore my vote is to
ACCEPT this library. Joaquin has a track record of perpetual improvements
to the libraries he works on so even if there is an identified shortcoming,
I doubt it will last long. I appreciate the work that went into the library
and I think it will do well in its domain.
Thanks for your review Vinnie!

Joaquin M Lopez Munoz

Re: [boost] [Bloom] review starts on 13th of May

Joaquin M López Muñoz