Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)

10 Jun 2009

      On Tue, Jun 9, 2009 at 11:47 PM, Dean Michael
Berris<mikhailberis@gmail.com> wrote:
...
Hi Manuel,
On Tue, Jun 9, 2009 at 8:45 PM, Manuel
Fiorelli<manuel.fiorelli@gmail.com> wrote:
...
About the implementation:
If I understand your implementation, you do something like that:
given a value V,
you compute a number X (say the digest of V),
then you compute K different values applying the transformation (X +
offset) mod(M)
Right. This naive filter will almost always create K consecutive
indexes (or results) from the hash functions.
Which I've addressed in revision 53785 -- now the default_hash<> uses
Boost.Hash internally with statically defined seeds.
...
...
About the documentation:
I think that the following paragraph in the documentation must be reformulated
<snip>
To find out whether we've encountered a certain URL before, we simply
pass the URL to the same three hash functions. If we see that all the
positions returned by the three hash functions are all 1, then we
might have seen the URL before -- this is called a false positive.
</snip>
indeed, you cannot say that the URL is a false positive, since it
could be equal to an URL stored previously.
Right, I need to make that a little more clear. Thanks for pointing it out.
This is something I'll be working on later.

HTH!

-- 
Dean Michael Berris | Software Engineer, Friendster, Inc.
blog.cplusplus-soup.com | twitter.com/mikhailberis |
linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis |
deanberris.com