
What follows is my review of Boost.Bloom. Please note, I am a decision-maker at the C++ Alliance and Joaquin has a working relationship with the non-profit. The Boost.Bloom library was not sponsored in any part by C++ Alliance, and I was not aware he was working on it until he first announced it. That said, Joaquin is a pretty solid developer and the key word which describes his work output is "consistent quality." When I first looked at the library I was not able to use its CMakeLists.txt to create a proper Visual Studio solution and project files containing the header files of the library so that I can browse the code, and projects containing the tests so I can set breakpoints and inspect the behavior of the implementation at run-time. However, Mohammad kindly provided a patch and thus the version of the library I am reviewing is slightly different, as it contains this additional commit: https://github.com/ashtum/bloom/commit/e119a3c868757104d929b21b5a0d06266d60b... Although it is not a requirement, I do wish more libraries submitted for review had these additions to the CMakeLists.txt as my skills at compiling and running things outside of Visual Studio are incredibly limited (or rather, non-existent). I am far more likely to submit a review for a library that I can interact with in the Visual Studio IDE with a minimum of effort. Thankfully the solution was generated and I got this nice looking set of projects and files (https://i.imgur.com/s7JNZ05.png) [image: image.png] Unfortunately I also got these warnings: https://gist.github.com/vinniefalco/056fe64e6302a761b703cb4ec2f5a255 Documentation is a mixed bag. I can't seem to find very important things like the minimum version of C++ required, or which compilers it supports, like I can find here: https://www.boost.org/doc/libs/latest/libs/json/doc/html/json/overview.html#... The content of the documentation is high quality but the layout is in my opinion terrible. I find it hard to read these class synopses and the styling is an impediment to legibility. This is not a problem specific to Boost.Bloom. Hopefully some of the work the C++ Alliance is doing in the area of documentation will empower authors to do better. The inclusion of the Visual Studio visualizer ("natvis file") is a very nice touch and an indicator of quality. Hopefully we will see the corresponding gdb pretty printer. Or perhaps someone might contribute it. Heads up, I am quite familiar with Bloom filters in terms of implementation experience although Joaquin is way ahead of me on the math and optimization. Bloom filters are very common in decentralized network programs such as Gnutella, BearShare, LimeWire, and now bitcoind. For example, thin bitcoind nodes can provide their directly connected peers with a compressed bloom filter, to reduce broadcast message volume to the subset of interested transactions: https://bitcoinops.org/en/topics/transaction-bloom-filtering Before the review period started, I had mentioned at least twice that it would be nice to see a demo of Boost.Bloom data structures patched into some other popular software which already uses these filters. A quick review of the repository turns up these leads which could provide for interesting study: https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34... https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34... https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34... The "rolling bloom filter" referenced above is a slightly different data structure from what Boost.Bloom offers, and I can't help but wonder if this "CRollingBloomFilter" can be implemented in terms of Boost.Bloom. If the Boost.Bloom powered version of bitcoind proves more efficient than its current implementation, it will be adopted by that project as such efficiencies contribute to increased decentralization. The example in the README could be improved by declaring the filter thusly: using filter = boost::bloom::filter<boost::core::string_view, 5>; I don't think it is a good idea to give users an example which performs unnecessary copying or allocation. I'm puzzled about the terminology. "insert" and "emplace" imply that new nodes or array elements of type value_type are being stored. This is not the case though, as these objects are hashed and the resulting digest is used to algorithmically adjust the filter which is in essence an array of bool. `emplace` documentation says that it constructs elements. This talk of subfilters is confusing and sounds like an implementation detail. And I trust that whatever is there is necessary, as Joaquin is not prone to frivolities. I think that the array() function(s) need to guarantee that bits beyond the capacity but present in the storage will be zeroed out. This way they can be serialized deterministically. I looked through the implementation, the optimizations are a bit beyond me although it appears to be well thought out and the benchmarks are nice. I ran the examples and I set some breakpoints and stepped into the code, where I understood that insert and emplace do not store the elements yet we are constructing them anyway which is disappointing for a string_view aficionado such as myself. I suppose this review can mostly be considered feedback and not any meaningful opposition to its inclusion in Boost, therefore my vote is to ACCEPT this library. Joaquin has a track record of perpetual improvements to the libraries he works on so even if there is an identified shortcoming, I doubt it will last long. I appreciate the work that went into the library and I think it will do well in its domain. Thanks