
On Sun, Dec 8, 2024 at 7:27 PM Ivan Matek via Boost
Hi everyone,
Implementation uses uint64_t in which it sums into hashes of particular elements(each computed with "fresh" hasher so ordering does not matter), then it finally hashes that sum and range size into original passed hasher.
I see no problem with logic this since it cleverly exploits the fact that uint64_t overflow is defined and that sum is commutative.
But I wonder if this preserves the cryptographic quality of original hash.
It doesn't, even if you preserve the length of the hash. There are two attack approaches here, depending on the size of the sets you can work with. On the low-data side, you can turn collision or second-preimage finding into a lattice problem. Suppose you have some set whose intermediate hashes are [a=13816517845079277968, b=2675547450723878905, c=18133324598626471763, d=10200581252073748329]. Build the lattice spanned by the rows [K*2^64, 0, 0, 0, 0] [K*a, 1, 0, 0, 0] [K*b, 0, 1, 0, 0] [K*c, 0, 0, 1, 0] [K*d, 0, 0, 0, 1] where K is a constant large enough to force the first coordinate of vectors to have more weight and sum to 0. Short vectors of this lattice will have the form [0, x, y, z, w], which means that x*a + y*b + z*c + w*d = 0 (mod 2^64), or in other words the set [a,b,c,d] and the set [[a]*(x+1), [b]*(y+1), [c]*(z+1], [d]*(w+1)] have the same hash. In the above case we can use K=2^32 and use LLL to obtain the vector [0, 2870, 27951, 8550, 46911] and thus a second-preimage for the above hash. Shorter second-preimages are possible by increasing the dimension. In particular, for 64-bit hashes it is possible to find lattice vectors containing coefficients only in {-1,0,1}, corresponding to removals and additions of single elements from the original set, at dimension ~48. Note that while finding shortest vectors in a lattice is a very hard problem as the dimension grows, here we don't really have to find exceedingly short vectors; they must instead have either -1 (removal) or positive coefficients. For collision-finding, where we control both inputs, we can start with many repetitions of the same element to make the problem easier and allow reasonably small negative coefficients as well. See also [2, Appendix B]. For larger sets, the generalized birthday attacks from Wagner [1, Section 4] also should apply here with minimal modifications, and reduces the security of an n-bit sum to O(2^{2sqrt(n)}). As far as I can tell the only successful multiset hashes either require enormous moduli or use elliptic curves. [1] https://www.iacr.org/archive/crypto2002/24420288/24420288.pdf [2] https://arxiv.org/pdf/1601.06502