
----- Mensaje original ----- De: Peter Dimov <pdimov@mmltd.net> Fecha: Jueves, Marzo 10, 2005 6:01 pm Asunto: Re: [boost] Re: Re: Re: [review] hash functions
Thorsten Ottosen wrote:
so basically you're saying that any collision is much worse than a slower hash function because we must use == to go through the
bucket ?
Something like that, yes; of course it depends on the level of slowness. Cryptographic hash functions are typically overkill. Increasing the number of collisions slightly is tolerable (but can easily negate the effect of the faster hash function). Hitting a pathological case where a significant percentage of the key set maps to the same bucket is a serious problem.
Maybe the following approach could be taken to allow for alternative implementations to be plugged into the framework: struct std_impl{}; template<typename T,typename Impl=std_impl> struct hash { std::size_t operator()(const T& x)const { return hash_value(x,Impl()); } }; template<typename T,typename Impl> std::size_t hash_value(const T& x,Impl impl) { return hash_value(x); } // standard hash_value overloads for integral types and pointers ... // template<typename T,typename Impl> void hash_combine(size_t& seed,const T& x,Impl impl) { seed^=hash_value(x,impl)+(seed<<6)+(seed>>2); } template<typename A,typename B,typename Impl> std::size_t hash_value(const std::pair<A,B>& x,Impl impl) { std::size_t seed=0; hash_combine(seed,x.first,impl); hash_combine(seed,x.second,impl); return seed; } template<typename It,typename Impl> std::size_t hash_range(It first,It last,Impl impl) { std::size_t=0; for(;first!=last;++first){ hash_combine(seed,*first,impl); } return seed; } template<typename E,typename T,typename A,typename Impl> std::size_t hash_value(const std::basic_string<E,T,A>& x,Impl impl) { return hash_range(x.begin(),x.end(),impl); } // same as previous for the rest of STL containers This framework is transparent to the regular user, who never specifies an alternative implementation and can define hashing for her own types as before: std::size_t hash_value(const my_type& x) { //... } unordered_set<my_type,/* boost::hash<my_type> */> us; But the "advanced" user can tweak one or more parts of the hashing framework ad libitum: struct crypto_impl{}; std::size_t hash_value(int x,crypto_impl) { // smart hash for ints } // we modify also range hashing template<typename It> std::size_t hash_range(It first,It last,crypto_impl) { std::size_t seed=0; std::size_t n=1; for(;first!=last;++first){ hash_combine(seed,*first,crypto_impl); seed*=n; // for instance n*=2; } return seed; } unordered_set<my_type,boost::hash<my_type,crypto_impl> > us; What do you think? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo