
Hi, I need for my application bloom filters. Do you know of a good boostified generic implementation? Is there any interest in such a generic bloom filters library in Boost? Thanks, Vicente

On Sun, Mar 22, 2009 at 2:43 PM, vicente.botet <vicente.botet@wanadoo.fr> wrote:
Hi,
I need for my application bloom filters. Do you know of a good boostified generic implementation? Is there any interest in such a generic bloom filters library in Boost?
I don't know of any quality implementation, but I am interested in seeing one in boost. -- Cory Nelson

vicente.botet wrote:
I need for my application bloom filters. Do you know of a good boostified generic implementation? Is there any interest in such a generic bloom filters library in Boost?
Hi Vicente, I have a simple generic Bloom filter here: http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh This is GPL licensed but I'd be happy to change that to the Boost license if someone wanted to make a Boost library using it. After writing this code I eventually decided to use a different method, so I'm not currently using this code anywhere. The code is very straightforward. I spent much longer learning the theory of Bloom filters than I did implementing it. Unfortunately I've now forgotten much of the theory and all that is left is the code... What's not included is hash functions; you have to supply these as a template parameter. I used Boost.CRC and the hash function from std:: unordered containers. I seem to have lost the example code that did this. One general problem with this approach is to find enough independent hash functions. Say I'm storing strings and I need four hash functions; the temptation is to have one function and to compute e.g. f(x), f("123"+x), f("456"+x), f("789"+x). These will give different answers and may look like they're independent functions, but it's not really certain how independent they are. Cheers, Phil.

Phil Endecott-48 wrote:
vicente.botet wrote:
I need for my application bloom filters. Do you know of a good boostified generic implementation? Is there any interest in such a generic bloom filters library in Boost?
Hi Vicente,
I have a simple generic Bloom filter here:
http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh
This is GPL licensed but I'd be happy to change that to the Boost license if someone wanted to make a Boost library using it.
After writing this code I eventually decided to use a different method, so I'm not currently using this code anywhere.
The code is very straightforward. I spent much longer learning the theory of Bloom filters than I did implementing it. Unfortunately I've now forgotten much of the theory and all that is left is the code...
What's not included is hash functions; you have to supply these as a template parameter. I used Boost.CRC and the hash function from std:: unordered containers. I seem to have lost the example code that did this. One general problem with this approach is to find enough independent hash functions. Say I'm storing strings and I need four hash functions; the temptation is to have one function and to compute e.g. f(x), f("123"+x), f("456"+x), f("789"+x). These will give different answers and may look like they're independent functions, but it's not really certain how independent they are.
Cheers, Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Hi, thanks for the pointer. The hard part (the hash functions) is left out quite elegantly. I was thinking about defining some hash traits wich will allow the user to set the hash functions in a independent way. Something like template <template T, size_t N> struct hash_traits; template <> hash_traits<SpecificUserType,0> { static unsigned hash(T& value) { // specific code } } BTW, besides the tr1::hash function, are there other hash generic functions in Boost? I would need also the union and intersection functions. I'm wondering if replacing the tr1::array by a boost::dynamic_bitset should make eassier this. Best, Vicente -- View this message in context: http://www.nabble.com/Request-interest-in-Bloom-filters-tp22651050p22662216.... Sent from the Boost - Dev mailing list archive at Nabble.com.

Vicente Botet Escriba wrote:
Phil Endecott-48 wrote:
vicente.botet wrote:
I need for my application bloom filters. Do you know of a good boostified generic implementation? Is there any interest in such a generic bloom filters library in Boost?
Hi Vicente,
I have a simple generic Bloom filter here:
http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh
I seem to have lost the example code
Now found and checked in here: http://svn.chezphil.org/libpbe/trunk/examples/spellcheck.cc
The hard part (the hash functions) is left out quite elegantly. I was thinking about defining some hash traits wich will allow the user to set the hash functions in a independent way. Something like
template <template T, size_t N> struct hash_traits;
template <> hash_traits<SpecificUserType,0> { static unsigned hash(T& value) { // specific code } }
BTW, besides the tr1::hash function, are there other hash generic functions in Boost?
You can use Boost.CRC. What sort of key do you have? I once did some investigation of hash functions for strings for use with unordered containers; I was considering using a degenerate hash that only looks at the first N characters. Something I didn't mention before is that if you need, say, 10-bit hash values then a 32-bit CRC could give you three of them. In this case your hash_traits would presumably compute the same CRC three times and extract different portions each time, which is not ideal.
I would need also the union and intersection functions. I'm wondering if replacing the tr1::array by a boost::dynamic_bitset should make eassier this.
Well it's not exactly difficult with the array. I've never used dynamic_bitset and being dynamic is not a requirement here. I was concerned (as normal) about getting a fast implementation and the simple bit manipulation in my code is probably about as fast as you can get. Cheers, Phil.
participants (5)
-
Cory Nelson
-
Jonathan Franklin
-
Phil Endecott
-
Vicente Botet Escriba
-
vicente.botet