Re: [boost] [hash2] Andrey's review

10 Dec 2024

      Andrey Semashev wrote:
...
Above, I have intentionally omitted the constructor from uint64_t as I find it
confusing. In particular, it isn't clear what endianness it uses to seed the
algorithm. It looks like it duplicates hash_append.
It doesn't use any endianness. It's just used as a seed from which to derive the
initial state.

Most, if not all, existing non-cryptographic algorithms take integral seeds.
...
The documentation should provide a more detailed description of the
HashAlgorithm::result_type type. Specifically, what interface and operations it
must support. Ideally, the library should provide a concept, similar to
HashAlgorithm. Currently, the documentation only says it can be an unsigned
integer type or a "std::array-like" type, where the latter is rather nebulous.
Hash2 started out as a C++03 library, and mandated use of boost::array. Then
it became a C++11 library, and mandated use of std::array. But then it became
a C++14 constexpr library, and std::array didn't work, so now it mandates the
nebulous "std::array-like" type (which is not yet completely nailed down, but
will crystallize with practice.)
...
Also on the topic of get_integral_result for "std::array-like" hash values, a few
comments:
- It requires the static size of 8 regardless of the size of the requested integer,
so even if uint32_t is requested it will still require at least std::array<unsigned
char, 8>. This should not be necessary.
I'm not yet clear on the practical utility of using a result_type that is an array
of fewer than 8 bytes. We don't have any such examples yet, so it's hard to
tell whether this makes any sense and whether it should be explicitly blessed,
and to what extent.
...
- It ignores the data beyond the initial 8 bytes. It might be ok for the purpose
of slicing the hash value, but it is inconsistent with get_integral_result for
integers, which attempts to mix bits through multiplications with a constant.
This is explicitly stated in the documentation of get_integral_result, in the
Remarks.

In practice it's a useful heuristic to assume that an array-like result implies
a hash function of high quality (an avalanching one) so that picking any M
bits would suffice.

Not making this assumption would make things a bit hairy; it's not quite
clear how to turn an arbitrary number of not-high-quality bits into 32. The
library tries its best for the integral types, but there the possibilities are at
least a finite amount and multiplications are built-in.
...
It might be possible to reuse one of the hash algorithms implemented in
the library for this bit mixing.
Interesting "meta" approach, but not necessarily better in practice than
the current one, under any practical criteria.
...
So I think, get_integral_result and HashAlgorithm::result_type should be
better defined by the library.
It most likely will be, eventually, once we have near-100% clarity on how
precisely it needs to be defined.
...
The hmac_* type aliases are not provided for siphash and xxhash. But I would
actually prefer that those type aliases be removed for the rest of the hash
algorithms, so that the algorithms don't bring the dependency on hmac.hpp,
which is not always needed by the user. Typing hmac<sha2_256> instead of
hmac_sha2_256 is hardly any more tedious.
hmac.hpp is a very light header, so...
...
Thank you Peter for submitting the library and Matt for managing the review.
Thanks, Andrey, for your review.

Re: [boost] [hash2] Andrey's review

Peter Dimov