
Had a bit more time to think :) so here are my replies and few more questions. On Sun, May 18, 2025 at 11:56 AM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:
El 17/05/2025 a las 20:36, Ivan Matek via Boost escribió:
Hi everybody,
I was reading the documentation/code, and I have some questions/comments.
1. Why is unit of measurement bit? I understand that conceptually bloom filters(maybe others, did not check) are bit arrays, but for me this seems like very error prone. Would verbose alternative of using boost::bloom::bits(1000) or boost::bloom::bytes(125) be preferred? I did not suggest UDL, as I feel for such rarely used class it may be overkill.
I'm happy to do whatever the communitty decides is best here.
I am a strong fan of strong types(e.g. std::chrono) so bits/bytes strong type has my vote, not sure what community in general prefers.
5. Why is BOOST_ASSERT(fpr>=0.0&&fpr<=1.0); not BOOST_ASSERT(fpr>0.0&&fpr<=1.0); , i.e. is there benefit of allowing calls with impossible fpr
argument? fpr==0.0 is a legitimate (if uninteresting) argument value for capacity_for:
capacity_for(0, 0.0) --> 24 capacity_for(1, 0.0) --> 18446744073709549592
The formal reason why fpr==0.0 is supported is because of symmetry: some calls to fpr_for actually return 0.0 (for instance, fpr_for(0, 100)).
This is a bit philosophical, but I actually do not feel this is correct. First of all is (0, 0.0) only usecase where fpr of 0.0 makes sense? i.e any time when n>0 fpr 0.0 is impossible(or I misunderstood something). So assert could be implies(it is funny because we had discussion about implies on ML few months ago), so something like: BOOST_IMPLICATION(fpr == 0.0, n == 0); Similarly for (1, 0.0) I do not believe result should be size_t max value, as this is not correct value. Now we both know you will OOM before noticing this in reality, but even if we imagine magical computer that can allocate that much memory fpr is not 0.0. If this library was not C++11 I would suggest std::optional as return type, but as it is boost::optional seems like best option. Now I know people might think I am making API ugly for users, but I really do not think a bit more typing is such a big problem when the alternative is people messing up(probably not when using constants in code, more likely when they dynamically compute something). Bloom filters are important, but they are not like json or protobuf where they are everywhere in codebase, users needing to use a bit uglier API in 10LOC in 1M LOC codebase does not seem like a big deal to me. So in examples above: capacity_for(0, 0.0) - > min_possible_capacity capacity_for(1, 0.0) - > nullopt Few more questions I thought of recently, apologies in advance if I misunderstood something. 1. There is some SIMD code that seems to have cost in K that is proportional to "greater or equal multiple of 8". I wonder does that affect recommendation for optimal K or not? e.g if table we disused before tells me to use K 14 should I bump it up to 16 if using fast_multiblock64? static BOOST_FORCEINLINE bool check(const value_type& x,boost::uint64_t hash) { for(int i=0;i<k/8;++i){ if(!check_m256ix2(x[i],hash,8))return false; hash=detail::mulx64(hash); } if(k%8){ if(!check_m256ix2(x[k/8],hash,k%8))return false; } return true; } 2. Fact that bloom filter has a break in lookup(you can break as soon as one bit is not set(let's ignore simd now) makes me wonder if performance depends on hit rate? E.g. If I know my hit rate will be ~10% or ~90% should that affect my picking of K? 3. I have just noticed I wrongly remembered that g++ defaults to march=native, it does not. Why is this related to this library? Well I am not sure fallback in fast_ headers should exist. In other words when I want to use fast_ and AVX2 is not activated I would prefer code to fail to compile. So I would remove this fallback: template<std::size_t K> using fast_multiblock64=multiblock<boost::uint64_t,K>; Now there is reasonable argument that people do not want their code to break when they change compiler flags, or target new architecture(e.g. one where you did not add SIMD for), but I believe SIMD speedup is so huge that this should not be a silent change. If users really want fallback behavior they can do their own preprocesor + using logic. 4. Another thing related to SIMD is question of ones arrays used in make_m256ix2/make_m256i. Initially I have assumed they are not static const(guaranteed construction just once, but thread safe init overhead) because compiler is always smart enough to optimize this to some readonly memory that will be initialized at compile time and everything is great. I looked into asm of my simple program test I wrote on my machine, initialization is done at compile time, great. But :) then I went to godbolt, and made small reproduction of code, and there I can not <https://godbolt.org/z/sYfc7rffa> get the compilers to avoid populating that array at runtime. Now I do not expect for you to go over godbolt link and try to nudge compilers to optimize this by changing flags or example code, my question is more about if this is fragile is that maybe a reason to use some ugly trick to initialize those values at compile time(e.g. as constexpr array of properly aligned uint32 that when interpreted as ones has correct values). If anybody reading this has any idea why this happens please do tell. My first guess is that I messed up when porting code to small godbolt example. :) My second guess would be that in bloom code I extracted for godbolt is used in a way that hints to compiler to do the initialization at compile time(e.g. kp values are known at compile time), but this is just a guess. So to be clear there is no problem I can detect when using bloom library, I am just worried that with a huge combinations of compilers, standards, optimization levels(e.g. O2 vs O3) and architectures there may be a subset of users that will hit performance issue if bloom does not force compile time init of those values. To be more specific, for this source: int main() { using simd_filter = boost::bloom::filter<std::string_view, 1, boost::bloom::fast_multiblock64<2>, 1>; simd_filter sf; char str[2] = {char('a'+(rand()%26)), '\0'}; sf.insert(str); str[0] = char('a'+(rand()%26)); return sf.may_contain(str); } on my machine clang generates beautiful asm in a sense there is no initialization of ones at runtime. When I copy godbolt example to my machine initialization of ones at runtime exists, so it is probably not just godbolt/my machine difference.