[boost] Re: [review] hash functions

11 Mar 2005


      Tobias Schwinger wrote:
...
I am missing a way to parametrize how hash values are combined.
This is especially desirable because the skeleton could then be
exploited to implement double hashing for compound data (a comon
technique to avoid collisions by varying the hash function) and can be 
useful in other situations (e.g. a small hash table in memory vs. a 
large one on disk).
I think if you want to combine hash values in a different manner, it's 
best just write your own version of hash_combine. It could save a little 
code if we supplied a version of hash_range that works for different 
combining functions, but I don't think it'll be that useful (and would 
require that your hash_combine function had the same signature as ours, 
you might want to do it differently).
...
- What is your evaluation of the documentation?
Good. Easily understandable.
Nitpicking: a hash table is not required to store its elements
in buckets in general - this is a property of the most common way to
implement hash tables.
Good point. I've been spending too much time looking at the draft 
standard for hash based containers (which require buckets).
...
- What is your evaluation of the implementation?
Here are my notes from reading the source:
The implementation of hash_combine does not need conditional jumps as a 
probably slightly slower variant, which I've seen quite often does. I 
know this variant works empirically well in practice, so will this one, 
I figure.
However, I believe that hashing speed is generally overrated and that 
there are numerous situations where it's worth to trade some of it for 
fewer collisions (as memory access generally gives more room for 
performance optimization than CPU instructions do).
I do need to spend more time looking into alternative hashing algorithms 
for a future version.
...
One idea that immediately came to my mind was some "syntactic sugar" to 
ease hashing of class members:
std::size_t hash_value(point const& p)
  {
    return (boost::hash_value = p.x, p.y, p.z);
  }
Note: It's just a spontanous idea I am not convinced of myself...
A hash function for boost::tuple would be a good way to do this, 
although something along those lines might be more efficient.
...
The for loop in hash_float_value is a nice one to be unrolled via 
metaprogramming.
Yes, for anyone who hasn't looked at the code it loops a fixed number of 
times based on the accuracy of the float type. Compilers could unroll 
that loop themselves, but that's probably too much to expect.
...
- Do you think the library should be accepted as a Boost library?
Yes. Though there should be some research if it is possible to add to 
its genericity as mentioned in the first paragraph (just saw the latest 
discussions and seems we're already at it).
Thanks for your review.

Daniel

[boost] Re: [review] hash functions

Daniel James