Re: [boost] Re: Re: [review] hash functions

10 Mar 2005

      Thorsten Ottosen wrote:
...
"Peter Dimov" <pdimov@mmltd.net> wrote in message
news:019c01c5256a$561efe40$6601a8c0@pdimov...
...
Peter Dimov wrote:
...
But as I said, benchmarking hash functions is pointless. Real
programs do not compute the hash value of a random 256 byte buffer
five million times.
... or if they do, they use different 256 byte buffers and do not
operate from the L1 cache. ;-)
no, but AFAICT we cannot exclude that a certain hash-function is much
faster *and* otherwise behaves as good as the other "naive" ones.
Simplistic benchmarks can't be used to evaluate the performance of a 
component; it should be benchmarked in its usual role. In most real world 
scenarios, the performance of a hash function is memory access bound, 
because the hashed data isn't already in the L1 cache. The proper way to 
evaluate the performance of a hash function is to benchmark unordered_map 
lookups.

Which certain hash function is much faster and otherwise has all the 
desirable characteristics of the proposed design and implementation?
...
Can anybody answer this question: is it always optimal to use the
whole range to compute the hash value; for eg. strings, if I know the
average length of my strings, I can't see that it would make sense to
process much more the the average length (or perhaps even shorter).
The default hash function must be good enough if no extra information about 
the properties of the key set is available. When extra information _is_ 
available, you shouldn't use the default.

Excluding some parts of the range leads to pathological cases when the key 
set only differs in the excluded parts. This leads to orders of magnitude 
slowdowns. (This is not speculation, it is a fact.) To add insult to injury, 
because CPU reads are typically a cache line wide (16/32 bytes), it can turn 
out that the "optimized" hash function isn't even much faster.