Re: [boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project

21 Jun 2011

...
...
Hello all,
I'm writing to request feedback, suggestions, and advice from the Boost
community. I've been working on a Bloom filter data structure for about
a month and a half now. At this point, the following is complete and
available in the Boost Sandbox:
- working, basic prototype (insert&  query operations, union&  intersect)
- a test suite covering the public interface
- five examples
- documentation covering many aspects of the project (tested on Chrome
12 and 13, Firefox 4 and 6)
I would suggest to you that you briefly explain what a Bloom filter
entails, and why it might be important, when sending your message to
On 6/21/2011 12:21 AM, Alejandro Cabrera wrote:
this list.
Thank you for the suggestion. The data structure is introduced below.

Briefly, a Bloom filter is a probabilistic data structure that 
represents a set of elements compactly. Two operations are supported in 
the most basic variant: insert, contains. An element is inserted by 
computing k hash values and setting the bits at the locations given by 
the hash values to 1. Checking whether an element is contained in the 
Bloom filter is achieved by hashing the element and checking that the 
bits at those locations are all 1. A Bloom filter may incorrectly tell 
you that an element is in the set (false positive). If false positives 
are acceptable, Bloom filters are a powerful data structure to avoid 
expensive look ups.

There exist variants that also support element removal at the cost of an 
increase in storage requirements and the danger of false negatives - 
these are not covered by Boost.BloomFilter just yet.

Bloom filters have seen use in file systems (avoiding duplicate writes, 
deduplication), network caches (determining whether to attempt to fetch 
from cache), databases (does a row/column exist on disk?) and hardware 
design. Where the primary interest is to compactly represent a 
potentially very large universe of elements to mitigate latency 
differences, Bloom filters are a great choice.

Further details can be found in the project documentation files.

Alejandro Cabrera

tags

participants (1)