Re: [boost] [Bloom] review starts on 13th of May

22 May 2025

      What follows is my review of Boost.Bloom. Please note, I am a
decision-maker at the C++ Alliance and Joaquin has a working relationship
with the non-profit. The Boost.Bloom library was not sponsored in any part
by C++ Alliance, and I was not aware he was working on it until he first
announced it. That said, Joaquin is a pretty solid developer and the key
word which describes his work output is "consistent quality."

When I first looked at the library I was not able to use its CMakeLists.txt
to create a proper Visual Studio solution and project files containing the
header files of the library so that I can browse the code, and projects
containing the tests so I can set breakpoints and inspect the behavior of
the implementation at run-time. However, Mohammad kindly provided a patch
and thus the version of the library I am reviewing is slightly different,
as it contains this additional commit:

https://github.com/ashtum/bloom/commit/e119a3c868757104d929b21b5a0d06266d60b...

Although it is not a requirement, I do wish more libraries submitted for
review had these additions to the CMakeLists.txt as my skills at compiling
and running things outside of Visual Studio are incredibly limited (or
rather, non-existent). I am far more likely to submit a review for a
library that I can interact with in the Visual Studio IDE with a minimum of
effort.

Thankfully the solution was generated and I got this nice looking set of
projects and files (https://i.imgur.com/s7JNZ05.png)

[image: image.png]

Unfortunately I also got these warnings:

https://gist.github.com/vinniefalco/056fe64e6302a761b703cb4ec2f5a255

Documentation is a mixed bag. I can't seem to find very important things
like the minimum version of C++ required, or which compilers it supports,
like I can find here:

https://www.boost.org/doc/libs/latest/libs/json/doc/html/json/overview.html#...

The content of the documentation is high quality but the layout is in my
opinion terrible. I find it hard to read these class synopses and the
styling is an impediment to legibility.  This is not a problem specific to
Boost.Bloom. Hopefully some of the work the C++ Alliance is doing in the
area of documentation will empower authors to do better.

The inclusion of the Visual Studio visualizer ("natvis file") is a very
nice touch and an indicator of quality. Hopefully we will see the
corresponding gdb pretty printer. Or perhaps someone might contribute it.

Heads up, I am quite familiar with Bloom filters in terms of implementation
experience although Joaquin is way ahead of me on the math and
optimization. Bloom filters are very common in decentralized network
programs such as Gnutella, BearShare, LimeWire, and now bitcoind. For
example, thin bitcoind nodes can provide their directly connected peers
with a compressed bloom filter, to reduce broadcast message volume to the
subset of interested transactions:

https://bitcoinops.org/en/topics/transaction-bloom-filtering

Before the review period started, I had mentioned at least twice that it
would be nice to see a demo of Boost.Bloom data structures patched into
some other popular software which already uses these filters.  A quick
review of the repository turns up these leads which could provide for
interesting study:

https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...
https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...
https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34...

The "rolling bloom filter" referenced above is a slightly different data
structure from what Boost.Bloom offers, and I can't help but wonder if this
"CRollingBloomFilter" can be implemented in terms of Boost.Bloom. If the
Boost.Bloom powered version of bitcoind proves more efficient than its
current implementation, it will be adopted by that project as such
efficiencies contribute to increased decentralization.

The example in the README could be improved by declaring the filter thusly:

    using filter = boost::bloom::filter<boost::core::string_view, 5>;

I don't think it is a good idea to give users an example which performs
unnecessary copying or allocation.

I'm puzzled about the terminology. "insert" and "emplace" imply that new
nodes or array elements of type value_type are being stored. This is not
the case though, as these objects are hashed and the resulting digest is
used to algorithmically adjust the filter which is in essence an array of
bool. `emplace` documentation says that it constructs elements.  This talk
of subfilters is confusing and sounds like an implementation detail. And I
trust that whatever is there is necessary, as Joaquin is not prone to
frivolities.

I think that the array() function(s) need to guarantee that bits beyond the
capacity but present in the storage will be zeroed out. This way they can
be serialized deterministically.

I looked through the implementation, the optimizations are a bit beyond me
although it appears to be well thought out and the benchmarks are nice. I
ran the examples and I set some breakpoints and stepped into the code,
where I understood that insert and emplace do not store the elements yet we
are constructing them anyway which is disappointing for a string_view
aficionado such as myself.

I suppose this review can mostly be considered feedback and not any
meaningful opposition to its inclusion in Boost, therefore my vote is to
ACCEPT this library. Joaquin has a track record of perpetual improvements
to the libraries he works on so even if there is an identified shortcoming,
I doubt it will last long. I appreciate the work that went into the library
and I think it will do well in its domain.

Thanks

Re: [boost] [Bloom] review starts on 13th of May

Vinnie Falco