
On 6/25/2011 1:53 PM, Phil Endecott wrote:
I believe I have convinced myself that a bitwise AND on bloom filters gives something that is a superset of the intersection of the contents, i.e. the false positive rate is increased. A proof would be good.
Easily done: Take two Bloom filters, A and B, with the same hash-functions (h of them) and the same size vectors (N). If the number of bits set in one of them is s, then when h<<N and s>>1 the probability of a false positive is approximately (h/N)^s, so the false positive rate increases with the proportion of the bits that are set. If an element appears in both sets, then the bits corresponding to any element that appears in both sets will be set in both vectors and will then be set in the bit-wise conjunction. That same element will therefore appear as a positive in the conjunction so the conjunction at least includes the intersection. For any bit that is not among the bits that would be set in the properly constructed Bloom filter for (A intersect B) there is some probability that it is set by one of the elements in A's set that is not in B's /and/ by one of the elements in B's set that is not in A's, so it will appear in the conjunction bits. This means that the "s" for the conjunction will be larger than the s for the intersection so the false positive rate will be higher for the conjunction than for the intersection. The probability is fairly high for any particular bit not in the intersection to be set in the conjunction if the intersection is small relative to the number of elements in A and the number of elements of B. If the hash functions are each applied to the entire N bits, then there will be h*|A-B| opportunities for a bit to be set from an element in set A that is not in B and h*|B-A| opportunities for it to be set from an element in set B that is not in A. If each hash is applied to a distinct subset of oN/h bits from the total, then the probability is smaller but still not insignificant: |A-B| opportunities for elements in A but not in B times |B-A| opportunities for elements in B but not A. Topher Cooper