Re: [boost] [bloom] Peer Review Conclusion

30 May 2025


      El 30/05/2025 a las 18:27, Peter Dimov via Boost escribió:
...
Kostas Savvidis wrote:
...
...
On 29 May 2025, at 20:37, Joaquin M López Muñoz via Boost
<boost@lists.boost.org> wrote:
given two sequences of hash values hi and gi, we're not interested in
internal correlation of either hi or gi, but cross correlation between
the two sequences.
I am actually not sure which is more important for a Bloom filter,
autocorrelation or cross-correlation.
What is known is that the cross-correlation between two sequences in any
MCG or LCG ( x' = a * x mod k)  is even worse than the autocorrelation.
  It goes like this: if in the first bucket h1 = 2*g1, then hi=2*gi for all i or
buckets. This is independent of a and k. Same if you replace the "2" with "3"
etc.
Effectively there is 100% correletion between buckets. One cannot fix that
even with a better multiplier.
It's not clear to me why this would be a problem, but if it is, it can be fixed
by using an LCG (a*x+b) instead of an MCG (a*x).
Umm, I like the LCG idea as an additional sum is basically free. Kostas, 
would a
smart choice of b improve the statistical properties of thhe procedure? 
Note that
we can afford determining b as a function of a (here a=m where m is the 
capacity
of the array).

Joaquin M Lopez Munoz