
Kostas Savvidis wrote:
On 29 May 2025, at 20:37, Joaquin M López Muñoz via Boost <boost@lists.boost.org> wrote:
given two sequences of hash values hi and gi, we're not interested in internal correlation of either hi or gi, but cross correlation between the two sequences.
I am actually not sure which is more important for a Bloom filter, autocorrelation or cross-correlation.
What is known is that the cross-correlation between two sequences in any MCG or LCG ( x' = a * x mod k) is even worse than the autocorrelation. It goes like this: if in the first bucket h1 = 2*g1, then hi=2*gi for all i or buckets. This is independent of a and k. Same if you replace the "2" with "3" etc. Effectively there is 100% correletion between buckets. One cannot fix that even with a better multiplier.
It's not clear to me why this would be a problem, but if it is, it can be fixed by using an LCG (a*x+b) instead of an MCG (a*x).