
Arash Partow wrote:
Alejandro Cabrera wrote:
I'm writing to request feedback, suggestions, and advice from the Boost community. I've been working on a Bloom filter data structure for about a month and a half now. At this point, the following is complete and available in the Boost Sandbox: - working, basic prototype (insert & query operations, union & intersect) - a test suite covering the public interface - five examples - documentation covering many aspects of the project (tested on Chrome 12 and 13, Firefox 4 and 6)
Looks ok, but one important question - Why is the BF typed? Its not necessary and in fact there are many use-cases where one might want to insert and/or test membership for a range of different types using the same BF - all that those types require are that they be hashable.
Though not necessary, I believe having a typed BF is safer. If there are users that want to eschew type safety to allow operations on multiple types, I believe this can be accomplished either by using a Bloom filter that accepts insertions of type void *. Arash Partow wrote:
Another issue, along the lines of what Phil mentioned wrt naming of the "contains" method. It should really be called "contains" for two reasons, firstly its a transitive verb - "a doing" label which is indicative of how it would be used in code, and secondly you've provided the method false_positive_rate which coupled with the fact that a BF is a probabilistic set, indicates to the user that any result will inherently have a false positive probability assigned with it, which is naively proportional to a function of the bits in the BF and the number of elements inserted into the BF (and not necessarily the element's bit-length).
I agree that the transitive verb property is crucial. It communicates the intent of the operation. I've opted to go with the name "probably_contains" for the time being. Even though it is slightly longer to type, it highlights its probabilistic nature. I want there to be minimal confusion for users new to Bloom filters about the nature of the data structure. Arash Partow wrote:
All in all a good start - keep up the good work!
Thank you for the feedback! -- View this message in context: http://boost.2283326.n4.nabble.com/GSoC-Request-for-Feedback-on-Boost-Bloom-... Sent from the Boost - Dev mailing list archive at Nabble.com.