Re: [boost] [hash2] Comments

7 Dec 2024


      ...
Andrey Semashev wrote:
...
2. Somewhat related, on the use case of extending the size of the hash
value by repeatedly calling result(). Is this something that should be
encouraged? I'm not a specialist, but my understanding is that
extending the hash value this way does not improve its quality, as you
get a series of bits in the value that are predetermined based on the
lower bits. Isn't that effectively equivalent to just e.g. zero-extending the
hash value?
No, it isn't. Hashes often have more internal state than they output as the final
result, so extension can actually produce non-redundant bits. This is true, for
example, for xxhash and siphash, and there's an example in the docs of
extending xxhash.
Even if the effective number of bits doesn't change, it's still not the same as
zero-extending, because a zero-extended hash is of very poor quality.
(The constant zero bits, being constant and zero, are completely unaffected by
the input.)
Somewhat related, if you ask get_integral_result for more bits than there are
in the input value, it takes care to not just give you something zero- extended,
for the same reason.
Ideally, each bit in the hash value should have about 50% probability of being
1. This is called "avalanche property".
So, for example, if you have a 32 bit hash function and you need 64 bits, this
is how you would do it, ordered from best to worst:

// as good as possible
uint64_t f1( Hash& hash )
{
    uint32_t r1 = hash.result();
    uint32_t r2 = hash.result();
    return (uint64_t(r1) << 32) | r2;
}

// not that bad, but worse than the above
uint64_t f2( Hash& hash )
{
    return get_integral_result<uint64_t>( hash.result() );
}

// pretty bad, but at least better than the next one
uint64_t f3( Hash& hash )
{
    uint32_t r = hash.result();
    return (uint64_t(r) << 32) | r;
}

// worst possible
uint64_t f4( Hash& hash )
{
    return hash.result();
}

Re: [boost] [hash2] Comments

Peter Dimov