
Arash Partow wrote:
Phil Endecott wrote:
For some reason I have a mental block about how intersection works. Union is OK; I can see that
set a; set b; bloom_filter(a) union bloom_filter(b) == bloom_filter( a union b );
but it's not clear to me if
bloom_filter(a) intersect bloom_filter(b) == bloom_filter( a intersect b );
Can you convince me?
That is correct, afaik only union (or) and difference (xor) can be applied to non-counting BFs.
Sorry Arash but I don't understand your reply; what exactly is correct? I believe I have convinced myself that a bitwise AND on bloom filters gives something that is a superset of the intersection of the contents, i.e. the false positive rate is increased. A proof would be good. Regarding xor, it superficially looks like it might give the symetric set difference i.e. those elements in A or B but not both, but when you consider overlapping hashes I think it results in false negatives. Regards, Phil.