Re: [boost] Re: Re: Re: [review] hash functions

----- Mensaje original ----- De: Peter Dimov <pdimov@mmltd.net> Fecha: Jueves, Marzo 10, 2005 6:01 pm Asunto: Re: [boost] Re: Re: Re: [review] hash functions
Thorsten Ottosen wrote:
so basically you're saying that any collision is much worse than a slower hash function because we must use == to go through the
bucket ?
Something like that, yes; of course it depends on the level of slowness. Cryptographic hash functions are typically overkill. Increasing the number of collisions slightly is tolerable (but can easily negate the effect of the faster hash function). Hitting a pathological case where a significant percentage of the key set maps to the same bucket is a serious problem.
Maybe the following approach could be taken to allow for alternative implementations to be plugged into the framework: struct std_impl{}; template<typename T,typename Impl=std_impl> struct hash { std::size_t operator()(const T& x)const { return hash_value(x,Impl()); } }; template<typename T,typename Impl> std::size_t hash_value(const T& x,Impl impl) { return hash_value(x); } // standard hash_value overloads for integral types and pointers ... // template<typename T,typename Impl> void hash_combine(size_t& seed,const T& x,Impl impl) { seed^=hash_value(x,impl)+(seed<<6)+(seed>>2); } template<typename A,typename B,typename Impl> std::size_t hash_value(const std::pair<A,B>& x,Impl impl) { std::size_t seed=0; hash_combine(seed,x.first,impl); hash_combine(seed,x.second,impl); return seed; } template<typename It,typename Impl> std::size_t hash_range(It first,It last,Impl impl) { std::size_t=0; for(;first!=last;++first){ hash_combine(seed,*first,impl); } return seed; } template<typename E,typename T,typename A,typename Impl> std::size_t hash_value(const std::basic_string<E,T,A>& x,Impl impl) { return hash_range(x.begin(),x.end(),impl); } // same as previous for the rest of STL containers This framework is transparent to the regular user, who never specifies an alternative implementation and can define hashing for her own types as before: std::size_t hash_value(const my_type& x) { //... } unordered_set<my_type,/* boost::hash<my_type> */> us; But the "advanced" user can tweak one or more parts of the hashing framework ad libitum: struct crypto_impl{}; std::size_t hash_value(int x,crypto_impl) { // smart hash for ints } // we modify also range hashing template<typename It> std::size_t hash_range(It first,It last,crypto_impl) { std::size_t seed=0; std::size_t n=1; for(;first!=last;++first){ hash_combine(seed,*first,crypto_impl); seed*=n; // for instance n*=2; } return seed; } unordered_set<my_type,boost::hash<my_type,crypto_impl> > us; What do you think? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Joaquin wrote:
unordered_set<my_type,boost::hash<my_type,crypto_impl> > us;
the why not just define your own hash struct: struct my_hash : public std::unary_function<std::string, std::size_t> { std::size_t operator()(std::string const& val) const; }; unordered_set<std::string, my_hash> my_set; ? -Thorsten

JOAQUIN LOPEZ MU?Z wrote:
// we modify also range hashing
template<typename It> std::size_t hash_range(It first,It last,crypto_impl) { std::size_t seed=0; std::size_t n=1; for(;first!=last;++first){ hash_combine(seed,*first,crypto_impl); seed*=n; // for instance n*=2; } return seed; }
unordered_set<my_type,boost::hash<my_type,crypto_impl> > us;
What do you think?
Unfortunately, you wouldn't be able to store a cryptographic hash function's state in a std::size_t. Daniel

JOAQUIN LOPEZ MU?Z wrote:
Maybe the following approach could be taken to allow for alternative implementations to be plugged into the framework: [...]
unordered_set<my_type,boost::hash<my_type,crypto_impl> > us;
The proposal is not a framework, it contains two simple tools for constructing hash_value overloads for compound types and a suite of _default_ hash functions. Alternative implementations (who may or may not use the supplied tools) are supported by unordered_set by simply using unordered_set< my_type, my_hash > us; There's no need to plug them in.
participants (4)
-
Daniel James
-
JOAQUIN LOPEZ MU?Z
-
Peter Dimov
-
Thorsten Ottosen