Re: [boost] Re: [review] hash functions

10 Mar 2005

      Daniel James wrote:
...
Peter Dimov wrote:
...
Thorsten Ottosen wrote:
...
Peter, I don't like that hasing a string is suboptimal. Likewise,
I would like hasing of sub_range<string> or sub_range<vector<char>>
to be really fast.
See my other post. When talking about hash functions, you really
need to have a clear understanding of "suboptimal" in mind. I don't
claim to be an expert, but at least I have tried to familiarize
myself with the research done in this area. My advice is: if you are
going to mess with a (standard) hash function, you better (a)
understand what you are doing, (b) have an extensive test suite that
benchmarks an unordered_map with a wide range of real-world key sets.
Well, these two pages describe algorithms for hashing strings, which
apparently are faster with good (or better?) quality (I haven't tested
them yet):
http://burtleburtle.net/bob/hash/doobs.html
http://www.azillionmonkeys.com/qed/hash.html
These pages are good.

But as I said, benchmarking hash functions is pointless. Real programs do 
not compute the hash value of a random 256 byte buffer five million times. 
They usually need to do something with the hash value. This something 
typically involves a division on a prime to obtain a modulus, several 
pointer chasings, and at least one equality comparison (which on a 256 byte 
buffer will have considerable impact).