[boost] Re: Re: Re: [review] hash functions

10 Mar 2005

      "Peter Dimov" <pdimov@mmltd.net> wrote in message 
news:000901c52580$359f96a0$6601a8c0@pdimov...
| Thorsten Ottosen wrote:
| > "Peter Dimov" <pdimov@mmltd.net> wrote in message
| > news:019c01c5256a$561efe40$6601a8c0@pdimov...
| >> Peter Dimov wrote:
| >>> But as I said, benchmarking hash functions is pointless. Real
| >>> programs do not compute the hash value of a random 256 byte buffer
| >>> five million times.
| >>
| >> ... or if they do, they use different 256 byte buffers and do not
| >> operate from the L1 cache. ;-)
| >
| > no, but AFAICT we cannot exclude that a certain hash-function is much
| > faster *and* otherwise behaves as good as the other "naive" ones.
|
| Simplistic benchmarks can't be used to evaluate the performance of a
| component; it should be benchmarked in its usual role. In most real world
| scenarios, the performance of a hash function is memory access bound,
| because the hashed data isn't already in the L1 cache. The proper way to
| evaluate the performance of a hash function is to benchmark unordered_map
| lookups.

right; then maybe somebody should make those test?

| Which certain hash function is much faster and otherwise has all the
| desirable characteristics of the proposed design and implementation?

one fo the two links that James mentioned mentioned  something about the
distribution being as good as normal.

| > Can anybody answer this question: is it always optimal to use the
| > whole range to compute the hash value; for eg. strings, if I know the
| > average length of my strings, I can't see that it would make sense to
| > process much more the the average length (or perhaps even shorter).
|
| The default hash function must be good enough if no extra information about
| the properties of the key set is available. When extra information _is_
| available, you shouldn't use the default.
|
| Excluding some parts of the range leads to pathological cases when the key
| set only differs in the excluded parts. This leads to orders of magnitude
| slowdowns. (This is not speculation, it is a fact.)

so basically you're saying that any collision is much worse than a slower
hash function because we must use == to go through the bucket ?

-Thorsten