
Andrey Semashev wrote:
Above, I have intentionally omitted the constructor from uint64_t as I find it confusing. In particular, it isn't clear what endianness it uses to seed the algorithm. It looks like it duplicates hash_append.
It doesn't use any endianness. It's just used as a seed from which to derive the initial state. Most, if not all, existing non-cryptographic algorithms take integral seeds.
The documentation should provide a more detailed description of the HashAlgorithm::result_type type. Specifically, what interface and operations it must support. Ideally, the library should provide a concept, similar to HashAlgorithm. Currently, the documentation only says it can be an unsigned integer type or a "std::array-like" type, where the latter is rather nebulous.
Hash2 started out as a C++03 library, and mandated use of boost::array. Then it became a C++11 library, and mandated use of std::array. But then it became a C++14 constexpr library, and std::array didn't work, so now it mandates the nebulous "std::array-like" type (which is not yet completely nailed down, but will crystallize with practice.)
Also on the topic of get_integral_result for "std::array-like" hash values, a few comments:
- It requires the static size of 8 regardless of the size of the requested integer, so even if uint32_t is requested it will still require at least std::array
. This should not be necessary.
I'm not yet clear on the practical utility of using a result_type that is an array of fewer than 8 bytes. We don't have any such examples yet, so it's hard to tell whether this makes any sense and whether it should be explicitly blessed, and to what extent.
- It ignores the data beyond the initial 8 bytes. It might be ok for the purpose of slicing the hash value, but it is inconsistent with get_integral_result for integers, which attempts to mix bits through multiplications with a constant.
This is explicitly stated in the documentation of get_integral_result, in the Remarks. In practice it's a useful heuristic to assume that an array-like result implies a hash function of high quality (an avalanching one) so that picking any M bits would suffice. Not making this assumption would make things a bit hairy; it's not quite clear how to turn an arbitrary number of not-high-quality bits into 32. The library tries its best for the integral types, but there the possibilities are at least a finite amount and multiplications are built-in.
It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing.
Interesting "meta" approach, but not necessarily better in practice than the current one, under any practical criteria.
So I think, get_integral_result and HashAlgorithm::result_type should be better defined by the library.
It most likely will be, eventually, once we have near-100% clarity on how precisely it needs to be defined.
The hmac_* type aliases are not provided for siphash and xxhash. But I would actually prefer that those type aliases be removed for the rest of the hash algorithms, so that the algorithms don't bring the dependency on hmac.hpp, which is not always needed by the user. Typing hmac
instead of hmac_sha2_256 is hardly any more tedious.
hmac.hpp is a very light header, so...
Thank you Peter for submitting the library and Matt for managing the review.
Thanks, Andrey, for your review.