
Thorsten Ottosen wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:019c01c5256a$561efe40$6601a8c0@pdimov...
Peter Dimov wrote:
But as I said, benchmarking hash functions is pointless. Real programs do not compute the hash value of a random 256 byte buffer five million times.
... or if they do, they use different 256 byte buffers and do not operate from the L1 cache. ;-)
no, but AFAICT we cannot exclude that a certain hash-function is much faster *and* otherwise behaves as good as the other "naive" ones.
Simplistic benchmarks can't be used to evaluate the performance of a component; it should be benchmarked in its usual role. In most real world scenarios, the performance of a hash function is memory access bound, because the hashed data isn't already in the L1 cache. The proper way to evaluate the performance of a hash function is to benchmark unordered_map lookups. Which certain hash function is much faster and otherwise has all the desirable characteristics of the proposed design and implementation?
Can anybody answer this question: is it always optimal to use the whole range to compute the hash value; for eg. strings, if I know the average length of my strings, I can't see that it would make sense to process much more the the average length (or perhaps even shorter).
The default hash function must be good enough if no extra information about the properties of the key set is available. When extra information _is_ available, you shouldn't use the default. Excluding some parts of the range leads to pathological cases when the key set only differs in the excluded parts. This leads to orders of magnitude slowdowns. (This is not speculation, it is a fact.) To add insult to injury, because CPU reads are typically a cache line wide (16/32 bytes), it can turn out that the "optimized" hash function isn't even much faster.