
On Tue, 13 May 2025 at 11:48, Arnaud Becheler via Boost <boost@lists.boost.org> wrote:
Dear Boost community,
The review of the Boost.Bloom library begins today May 13th, 2025, and will run through May 22nd, 2025.
Hi all, Since I'm not very knowledgeable of Bloom filters or containers, and I'm friends with the author, I won't emit a formal review of the proposed Boost.Bloom. However, I still would like to provide some (hopefully useful) feedback for the author. In general, I think the library is great - it solves one problem and solves it well, and the documentation does a great job explaining the domain problem, the design decisions that were made and how to use the library. Some points regarding the interface and implementation: * Peeking the implementation, I've got the impression that the block class, when using integers distinct to unsigned char (e.g. block<std::uint32_t>), end up using reinterpret_cast<std::uint32_t*> on the underlying unsigned char array. AFAIK this is undefined behavior, since no std::uint32_t object exists at this location. I don't know how relevant this is though, as I guess it works fine in most compilers. * Is there a reason to use boost::uint64_t and boost::uint32_t in a C++11 library? (vs. std::uint64_t and std::uint32_t). * Does filter::reset() without arguments make sense? * Does filter::emplace() add any value over just constructing and inserting an object? In a traditional container, this is justified because the elements are actually stored, but that does not seem to be the case here. * Do the constructor and operator= overloads taking a range of elements add any value? They seem to be equivalent to constructing/assigning and then inserting the values separately. * The block classes seem to perform no type checking on the passed Block type argument. * The convenience header <boost/bloom.hpp> seems to be missing. * I actually like the naming chosen for filter::may_contain(). I think it conveys pretty succinctly what it does. * I also like the fact that you provided a natvis file for your classes. Some points regarding the documentation: * I struggle to understand the BucketSize template parameter. I think this is because the term "bucket" is never mentioned in the primer. It'd be cool if some information regarding this concept was mentioned in the primer. * The serialization example saves and loads a filter using multiblock<std::uint64_t>. Is the bit pattern generated by the filter endianness-dependent? One of the papers you reference [1] mentions sending such representations over the network, which would make this point relevant for the reader. * I've found the examples more difficult to read than what I'd have liked. Some more comments would be useful. * Examples in the docs would benefit from syntax highlighting. You can use asciidoc [source,cpp] blocks to achieve it. * I'd try to avoid C-style casts in the examples. * I'd try to make code snippets as close to real code as possible. That is, I'd try to avoid things like "using filter = boost::bloom::filter<std::string, ...>;", in favor of something that the user may copy to their code. Some points regarding testing: * It looks like the examples are not being built by any CI build. This is dangerous because it yields to code rotting. * It looks like the tests are not being built from CMake. Having "if(BUILD_TESTING AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/test/CMakeLists.txt")" in the main CMakeLists.txt may trick the reader into thinking they are, but there is no test/CMakeLists.txt. I understand that this is what's generated by default by boostdep --cmake, but I'd advise either to add a test/CMakeLists.txt (preferable) or removing the if statement altogether. * The Jamfile in example/ shouldn't specify <cxxstd> directly. * Adding a code coverage metric would be useful. Hope this helps. Regards, Ruben. [1] https://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf