
Hi Manuel, On Tue, Jun 9, 2009 at 8:45 PM, Manuel Fiorelli<manuel.fiorelli@gmail.com> wrote:
About the implementation: If I understand your implementation, you do something like that: given a value V, you compute a number X (say the digest of V), then you compute K different values applying the transformation (X + offset) mod(M)
Right. This naive filter will almost always create K consecutive indexes (or results) from the hash functions.
I don't know if this produces k hash functions suitable for bloom filters, but, why don't you use Boost.hash to compute X? Also, it would be easier to define the requirements on the Input type.
I've considered using Boost.Hash to compute the hash values -- the naive implementation up there is meant to be a sample. I'm in the process of (re)thinking how the functions are made part of the type instead. Maybe instead of just Boost.Array's at construction time I'll consider Fusion-compatible sequences that are part of the signature (thus the type of the filter).
About the documentation: I think that the following paragraph in the documentation must be reformulated
<snip> To find out whether we've encountered a certain URL before, we simply pass the URL to the same three hash functions. If we see that all the positions returned by the three hash functions are all 1, then we might have seen the URL before -- this is called a false positive. </snip>
indeed, you cannot say that the URL is a false positive, since it could be equal to an URL stored previously.
Right, I need to make that a little more clear. Thanks for pointing it out.
IMHO, you should describe the mathematical principles in more detail and, moreover, you should discuss in more detail the false positives and how to control their rate.
I agree. I'll work on it some more and gather more resources worth referencing.
Even if it isn't mandatory, it would be nice if you write your documentation in QuickBook format, since it will uniform both the documentation toolchain and the documentation look.
I chose to write in ReStructuredText first to make the documentation generation cycle faster -- it should be easier later to create Quickbook documentation from the RST sources if I remember my quickbook correctly. ;-)
Also, don't forgive to synchronize the documentation with the current interface.
Right. More work for me then. ;-) Thanks for the insights! :-D -- Dean Michael Berris | Software Engineer, Friendster, Inc. blog.cplusplus-soup.com | twitter.com/mikhailberis | linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis | deanberris.com