
Having bloom_filter in boost:: is great idea! It is super useful, we use in production code (our own implementation (actually we use Bloomier filters with murmur hash perfect hashing from your wiki reference). It lets us use very memory and lookup efficient data structure for very large datasets (think 100 GB file with strings on SSD disk indexed by e.g. 200 MB Bloomier filter in memory) Did you considered adding serialization of a bloom_filter to your implementation? In general reconstructing hash based containers with series of inserts is pretty inefficient. Use case that I'm talking about: e.g. for you web proxy scenario proxy service keeps running and downloading, caching URLs and adding them to bloom filter. Than the process needs to be restarted for some reason. All documents downloaded and stored on disk will have to be reitereted and their URLs reinserted to newly created bloom_filter, which makes the startup of the process slow. btw I have the same problem with standard containers (except std::vector). There is no efficient serialization / deserialization for them rendering them useless for any larger side project (like unordered_set of 1m strings). -- View this message in context: http://boost.2283326.n4.nabble.com/GSoC-Request-for-Feedback-on-Boost-Bloom-... Sent from the Boost - Dev mailing list archive at Nabble.com.