
Hello all,
I'm writing to request feedback, suggestions, and advice from the Boost community. I've been working on a Bloom filter data structure for about a month and a half now. At this point, the following is complete and available in the Boost Sandbox: - working, basic prototype (insert& query operations, union& intersect) - a test suite covering the public interface - five examples - documentation covering many aspects of the project (tested on Chrome 12 and 13, Firefox 4 and 6) I would suggest to you that you briefly explain what a Bloom filter entails, and why it might be important, when sending your message to
On 6/21/2011 12:21 AM, Alejandro Cabrera wrote: this list.
Thank you for the suggestion. The data structure is introduced below. Briefly, a Bloom filter is a probabilistic data structure that represents a set of elements compactly. Two operations are supported in the most basic variant: insert, contains. An element is inserted by computing k hash values and setting the bits at the locations given by the hash values to 1. Checking whether an element is contained in the Bloom filter is achieved by hashing the element and checking that the bits at those locations are all 1. A Bloom filter may incorrectly tell you that an element is in the set (false positive). If false positives are acceptable, Bloom filters are a powerful data structure to avoid expensive look ups. There exist variants that also support element removal at the cost of an increase in storage requirements and the danger of false negatives - these are not covered by Boost.BloomFilter just yet. Bloom filters have seen use in file systems (avoiding duplicate writes, deduplication), network caches (determining whether to attempt to fetch from cache), databases (does a row/column exist on disk?) and hardware design. Where the primary interest is to compactly represent a potentially very large universe of elements to mitigate latency differences, Bloom filters are a great choice. Further details can be found in the project documentation files.