Re: [boost] [GSoC] Request for Feedback on Boost.Bloom Filter Project

Thank you for the suggestion. The data structure is introduced below. Briefly, a Bloom filter is a probabilistic data structure that represents a set of elements compactly. Two operations are supported in the most basic variant: insert, contains. An element is inserted by computing k hash values and setting the bits at the locations given by the hash values to 1. Checking whether an element is contained in the Bloom filter is achieved by hashing the element and checking that the bits at those locations are all 1. A Bloom filter may incorrectly tell you that an element is in the set (false positive). If false positives are acceptable, Bloom filters are a powerful data structure to avoid expensive look ups. There exist variants that also support element removal at the cost of an increase in storage requirements and the danger of false negatives - these are not covered by Boost.BloomFilter just yet. Bloom filters have seen use in file systems (avoiding duplicate writes, deduplication), network caches (determining whether to attempt to fetch from cache), databases (does a row/column exist on disk?) and hardware design. Where the primary interest is to compactly represent a potentially very large universe of elements to mitigate latency differences, Bloom filters are a great choice. Further details can be found in the project documentation files.
participants (1)
-
Alejandro Cabrera