Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)

9 Jun 2009

      Hi Vicente!

On Tue, Jun 9, 2009 at 9:49 PM, Vicente Botet
Escriba<vicente.botet@wanadoo.fr> wrote:
...
...
On Tue, Jun 9, 2009 at 12:52 PM, Dean Michael
Berris<mikhailberis@gmail.com> wrote:
...
Yes, I have but I've put a premium at type safety and putting M and K
as part of the type instead of being a runtime value. The reasons are:
1. Specifying a static M allows the use of the standard bitset instead
of Boost's dynamic_bitset.
2. A static K allows for enforcing the number of hash functions at
compile time instead of at runtime.
3. Since M and K are not variable and are part of the type, it makes
little sense for me to have either of them set at construction time.
One thing I might be amenable to is a static K and a runtime M, but
that makes the hash functions unable to determine statically the
bounds of the bitset (M). I can change the concept for the hash
function, but the simplicity of size_t(Input) just makes it easier for
users I think.
I thought about it a little more and I don't think I like the idea of
a separate static and dynamic bloom_filter. If there are more reasons
to allow for dynamically growing bloom filters, then I might support
this.
Two bloom_filters with different hash functions are not comparable.
Don't you think that the hash functions make part of the bloom_filter type?
   bloom_filter<M, hash_1, ... hash_k>bf;
Thinking about it a little more, yes this makes sense. However,
pending the formalization of variadic templates, I'm leaning towards
supporting Fusion sequences in the collection of hash functions:

    bloom_filter<Input, M, Sequence> bf;

This way I can deduce K from size<Sequence> and maybe even derive the
bloom_filter linearly from the Sequence elements (to optimize the
storage, using the compressed_pair trick). I'm still mulling it around
in my head, and will be trying to implement this in the morning (PH
time).

My only worry with this is that I'd have to define the Concept for
hash function types. It shouldn't be too hard especially if I'll build
upon the Boost.Hash concepts (if they're applicable).
...
...
Done, but I've run into the issue of not being able to compare
boost::function objects of the same type -- for some reason this has
been disabled in Boost Trunk.
If hash functions make part of the bloom_filter type there is no issue. No
need to compare the hash functions at run-time. The compiler will do it for
you.
Yup, this is true. Now I just have to implement it using Boost.Fusion
-- this I'm worried about because this somewhat makes the bloom_filter
implementation (and usage) a little more complex than I'd like, but if
the requirements dictate it then there would be less reasons for me to
avoid it. ;-)
...
...
...
...
8. bloom_filters are not currently serializable.
True. I might need to have a file 'boost/bloom_filter/serialize.hpp'
to add serializability non-intrusively.
This should come later... I still need to look into whether
serialization should only be done for the bitset state rather than the
whole type. I say this because up to now I don't think there's a way
of serializing boost::function objects and deserializing them
effectively either.
If hash functions make part of the bloom_filter type there is no issue, only
the data must be serialized.
You're correct here. Now it's just a matter of me implementing it then. :-)
...
HTH,
It definitely does help. Thanks! :-)

-- 
Dean Michael Berris | Software Engineer, Friendster, Inc.
blog.cplusplus-soup.com | twitter.com/mikhailberis |
linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis |
deanberris.com