Re: [boost] [Bloom] review starts on 13th of May

21 May 2025

      On Tue, 13 May 2025 at 11:48, Arnaud Becheler via Boost
<boost@lists.boost.org> wrote:
...
Dear Boost community,
The review of the Boost.Bloom library begins today May 13th, 2025, and will
run through May 22nd, 2025.
Hi all,

Since I'm not very knowledgeable of Bloom filters or containers, and
I'm friends with the author, I won't emit a formal review of the
proposed Boost.Bloom. However, I still would like to provide some
(hopefully useful) feedback for the author.

In general, I think the library is great - it solves one problem and
solves it well, and the documentation does a great job explaining the
domain problem, the design decisions that were made and how to use the
library.

Some points regarding the interface and implementation:

* Peeking the implementation, I've got the impression that the block
class, when using integers distinct to unsigned char (e.g.
block<std::uint32_t>), end up using reinterpret_cast<std::uint32_t*>
on the underlying unsigned char array. AFAIK this is undefined
behavior, since no std::uint32_t object exists at this location. I
don't know how relevant this is though, as I guess it works fine in
most compilers.
* Is there a reason to use boost::uint64_t and boost::uint32_t in a
C++11 library? (vs. std::uint64_t and std::uint32_t).
* Does filter::reset() without arguments make sense?
* Does filter::emplace() add any value over just constructing and
inserting an object? In a traditional container, this is justified
because the elements are actually stored, but that does not seem to be
the case here.
* Do the constructor and operator= overloads taking a range of
elements add any value? They seem to be equivalent to
constructing/assigning and then inserting the values separately.
* The block classes seem to perform no type checking on the passed
Block type argument.
* The convenience header <boost/bloom.hpp> seems to be missing.
* I actually like the naming chosen for filter::may_contain(). I think
it conveys pretty succinctly what it does.
* I also like the fact that you provided a natvis file for your classes.

Some points regarding the documentation:

* I struggle to understand the BucketSize template parameter. I think
this is because the term "bucket" is never mentioned in the primer.
It'd be cool if some information regarding this concept was mentioned
in the primer.
* The serialization example saves and loads a filter using
multiblock<std::uint64_t>. Is the bit pattern generated by the filter
endianness-dependent? One of the papers you reference [1] mentions
sending such representations over the network, which would make this
point relevant for the reader.
* I've found the examples more difficult to read than what I'd have
liked. Some more comments would be useful.
* Examples in the docs would benefit from syntax highlighting. You can
use asciidoc [source,cpp] blocks to achieve it.
* I'd try to avoid C-style casts in the examples.
* I'd try to make code snippets as close to real code as possible.
That is, I'd try to avoid things like "using filter =
boost::bloom::filter<std::string, ...>;", in favor of something that
the user may copy to their code.

Some points regarding testing:

* It looks like the examples are not being built by any CI build. This
is dangerous because it yields to code rotting.
* It looks like the tests are not being built from CMake. Having
"if(BUILD_TESTING AND EXISTS
"${CMAKE_CURRENT_SOURCE_DIR}/test/CMakeLists.txt")" in the main
CMakeLists.txt may trick the reader into thinking they are, but there
is no test/CMakeLists.txt. I understand that this is what's generated
by default by boostdep --cmake, but I'd advise either to add a
test/CMakeLists.txt (preferable) or removing the if statement
altogether.
* The Jamfile in example/ shouldn't specify <cxxstd> directly.
* Adding a code coverage metric would be useful.

Hope this helps.

Regards,
Ruben.

[1] https://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf

Re: [boost] [Bloom] review starts on 13th of May

Ruben Perez