
Ivan Matek wrote:
Hi everyone,
Implementation uses uint64_t in which it sums into hashes of particular elements(each computed with "fresh" hasher so ordering does not matter), then it finally hashes that sum and range size into original passed hasher.
I see no problem with logic this since it cleverly exploits the fact that uint64_t overflow is defined and that sum is commutative.
But I wonder if this preserves the cryptographic quality of original hash. For example if we use sha2 it seems to me that finding collisions becomes easier since sum is "only" 64 bits, and length is "only" 64 bits(or 32 on some systems), and may be even less if attacker knows unordered range was not super long, e.g. number of items was less than 10000.
Ideally I'd use 128 or 256 bit integer there, but using __uint128_t portably is still a pain, and hashing unordered containers is a very niche use case. You'd probably never encounter it in practice... well, except for boost::json::object, I suppose. Note that if you have things in the message preceding the unordered range, this effectively serves as a seed for the per-element hash function, which makes engineering collisions harder. Still, 64 bits are 64 bits. 128 would have been a good compromise between speed and security, but std::uint128_t is not here yet. https://github.com/cplusplus/papers/issues/1793 (Please don't leave comments on the above issue, it's not for discussion, but for tracking paper progress.) Matt Borland is considering proposing boost::uint128_t to Core, but that would mean depending on Core.