
On Tue, Jun 9, 2009 at 11:47 PM, Dean Michael Berris<mikhailberis@gmail.com> wrote:
Hi Manuel,
On Tue, Jun 9, 2009 at 8:45 PM, Manuel Fiorelli<manuel.fiorelli@gmail.com> wrote:
About the implementation: If I understand your implementation, you do something like that: given a value V, you compute a number X (say the digest of V), then you compute K different values applying the transformation (X + offset) mod(M)
Right. This naive filter will almost always create K consecutive indexes (or results) from the hash functions.
Which I've addressed in revision 53785 -- now the default_hash<> uses Boost.Hash internally with statically defined seeds.
About the documentation: I think that the following paragraph in the documentation must be reformulated
<snip> To find out whether we've encountered a certain URL before, we simply pass the URL to the same three hash functions. If we see that all the positions returned by the three hash functions are all 1, then we might have seen the URL before -- this is called a false positive. </snip>
indeed, you cannot say that the URL is a false positive, since it could be equal to an URL stored previously.
Right, I need to make that a little more clear. Thanks for pointing it out.
This is something I'll be working on later. HTH! -- Dean Michael Berris | Software Engineer, Friendster, Inc. blog.cplusplus-soup.com | twitter.com/mikhailberis | linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis | deanberris.com