
Samuel Neves wrote:
A minor question I have regards the constants used in get_result_multiplier(). The documentation states that this is used to produce a uniform output on the target range, but it's unclear from the documentation or source code the rationale or criteria for the method or constants used here. This seems to get into integer hash territory, but for example the 4 -> 1 case consists of x -> (x*0x7f7f7f7f) >> 24, for which easy differentials exist, e.g., (x, x^0xc0800000) collide with probability ~1/4.
It's pretty hard to come up with "good" multipliers here because it's not possible to quantify the "good" part. There are basically two cases, one where the 32 bit input is uniformly distributed, and one in which it isn't. And when it isn't, that is, it comes from a "bad" hash, it's not really possible to optimize the multiplier without a known input distribution, and we don't have that. So I came up with some ad hoc criteria here https://github.com/pdimov/hash2/blob/develop/test/get_integral_result_4.cpp and tried to make the multipliers work for them.