
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000901c52580$359f96a0$6601a8c0@pdimov... | Thorsten Ottosen wrote: | > "Peter Dimov" <pdimov@mmltd.net> wrote in message | > news:019c01c5256a$561efe40$6601a8c0@pdimov... | >> Peter Dimov wrote: | >>> But as I said, benchmarking hash functions is pointless. Real | >>> programs do not compute the hash value of a random 256 byte buffer | >>> five million times. | >> | >> ... or if they do, they use different 256 byte buffers and do not | >> operate from the L1 cache. ;-) | > | > no, but AFAICT we cannot exclude that a certain hash-function is much | > faster *and* otherwise behaves as good as the other "naive" ones. | | Simplistic benchmarks can't be used to evaluate the performance of a | component; it should be benchmarked in its usual role. In most real world | scenarios, the performance of a hash function is memory access bound, | because the hashed data isn't already in the L1 cache. The proper way to | evaluate the performance of a hash function is to benchmark unordered_map | lookups. right; then maybe somebody should make those test? | Which certain hash function is much faster and otherwise has all the | desirable characteristics of the proposed design and implementation? one fo the two links that James mentioned mentioned something about the distribution being as good as normal. | > Can anybody answer this question: is it always optimal to use the | > whole range to compute the hash value; for eg. strings, if I know the | > average length of my strings, I can't see that it would make sense to | > process much more the the average length (or perhaps even shorter). | | The default hash function must be good enough if no extra information about | the properties of the key set is available. When extra information _is_ | available, you shouldn't use the default. | | Excluding some parts of the range leads to pathological cases when the key | set only differs in the excluded parts. This leads to orders of magnitude | slowdowns. (This is not speculation, it is a fact.) so basically you're saying that any collision is much worse than a slower hash function because we must use == to go through the bucket ? -Thorsten