Re: [boost] Re: Re: Re: [review] hash functions

10 Mar 2005

      ----- Mensaje original -----
De: Peter Dimov <pdimov@mmltd.net>
Fecha: Jueves, Marzo 10, 2005 6:01 pm
Asunto: Re: [boost] Re: Re: Re: [review] hash functions
...
Thorsten Ottosen wrote:
...
so basically you're saying that any collision is much worse than a
slower hash function because we must use == to go through the
bucket ?
Something like that, yes; of course it depends on the level of 
slowness. 
Cryptographic hash functions are typically overkill. Increasing 
the number 
of collisions slightly is tolerable (but can easily negate the 
effect of the 
faster hash function). Hitting a pathological case where a 
significant 
percentage of the key set maps to the same bucket is a serious 
problem.
Maybe the following approach could be taken to
allow for alternative implementations to be plugged into
the framework:

struct std_impl{};

template<typename T,typename Impl=std_impl>
struct hash
{
  std::size_t operator()(const T& x)const
  {
    return hash_value(x,Impl());
  }
};

template<typename T,typename Impl>
std::size_t hash_value(const T& x,Impl impl)
{
  return hash_value(x);
}

// standard hash_value overloads for integral types and pointers
...
//

template<typename T,typename Impl>
void hash_combine(size_t& seed,const T& x,Impl impl)
{
  seed^=hash_value(x,impl)+(seed<<6)+(seed>>2);
}

template<typename A,typename B,typename Impl>
std::size_t hash_value(const std::pair<A,B>& x,Impl impl)
{
  std::size_t seed=0;
  hash_combine(seed,x.first,impl);
  hash_combine(seed,x.second,impl);
  return seed;
}

template<typename It,typename Impl>
std::size_t hash_range(It first,It last,Impl impl)
{
  std::size_t=0;
  for(;first!=last;++first){
    hash_combine(seed,*first,impl);
  }
  return seed;
}

template<typename E,typename T,typename A,typename Impl>
std::size_t hash_value(const std::basic_string<E,T,A>& x,Impl impl)
{
  return hash_range(x.begin(),x.end(),impl);
}

// same as previous for the rest of STL containers

This framework is transparent to the regular user, who
never specifies an alternative implementation and can
define hashing for her own types as before:

std::size_t hash_value(const my_type& x)
{
  //...
}

unordered_set<my_type,/* boost::hash<my_type> */> us;

But the "advanced" user can tweak one or more parts of
the hashing framework ad libitum:

struct crypto_impl{};

std::size_t hash_value(int x,crypto_impl)
{
  // smart hash for ints
}

// we modify also range hashing

template<typename It>
std::size_t hash_range(It first,It last,crypto_impl)
{
  std::size_t seed=0;
  std::size_t n=1;
  for(;first!=last;++first){
    hash_combine(seed,*first,crypto_impl);
    seed*=n; // for instance
    n*=2;
  }
  return seed;
}

unordered_set<my_type,boost::hash<my_type,crypto_impl> > us;

What do you think?

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

Re: [boost] Re: Re: Re: [review] hash functions

JOAQUIN LOPEZ MU?Z