
El 30/05/2025 a las 18:27, Peter Dimov via Boost escribió:
Kostas Savvidis wrote:
On 29 May 2025, at 20:37, Joaquin M López Muñoz via Boost <boost@lists.boost.org> wrote: given two sequences of hash values hi and gi, we're not interested in internal correlation of either hi or gi, but cross correlation between the two sequences. I am actually not sure which is more important for a Bloom filter, autocorrelation or cross-correlation.
What is known is that the cross-correlation between two sequences in any MCG or LCG ( x' = a * x mod k) is even worse than the autocorrelation. It goes like this: if in the first bucket h1 = 2*g1, then hi=2*gi for all i or buckets. This is independent of a and k. Same if you replace the "2" with "3" etc. Effectively there is 100% correletion between buckets. One cannot fix that even with a better multiplier. It's not clear to me why this would be a problem, but if it is, it can be fixed by using an LCG (a*x+b) instead of an MCG (a*x).
Umm, I like the LCG idea as an additional sum is basically free. Kostas, would a smart choice of b improve the statistical properties of thhe procedure? Note that we can afford determining b as a function of a (here a=m where m is the capacity of the array). Joaquin M Lopez Munoz