Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)

9 Jun 2009

      About the implementation:
If I understand your implementation, you do something like that:
given a value V,
you compute a number X (say the digest of V),
then you compute K different values applying the transformation (X +
offset) mod(M)

I don't know if this produces k hash functions suitable for bloom
filters, but, why don't you use Boost.hash to compute X? Also, it
would be easier to define the requirements on the Input type.

About the documentation:
I think that the following paragraph in the documentation must be reformulated

<snip>
To find out whether we've encountered a certain URL before, we simply
pass the URL to the same three hash functions. If we see that all the
positions returned by the three hash functions are all 1, then we
might have seen the URL before -- this is called a false positive.
</snip>

indeed, you cannot say that the URL is a false positive, since it
could be equal to an URL stored previously.

IMHO, you should describe the mathematical principles in more detail
and, moreover, you should discuss in more detail the false positives
and how to control their rate.

Even if it isn't mandatory, it would be nice if you write your
documentation in QuickBook format, since it will uniform both the
documentation toolchain and the documentation look.

Also, don't forgive to synchronize the documentation with the current interface.

Manuel Fiorelli