
On 12/10/24 16:09, Andrey Semashev wrote:
The documentation should provide a more detailed description of the HashAlgorithm::result_type type. Specifically, what interface and operations it must support. Ideally, the library should provide a concept, similar to HashAlgorithm. Currently, the documentation only says it can be an unsigned integer type or a "std::array-like" type, where the latter is rather nebulous. From the implementation of get_integral_result it looks like it must support data() member function returning a pointer and that its size() after default-construction is a constant expression and returns at least 8. This, for example, excludes std::vector, which would be useful for HashAlgorithmXOF::result_type.
Also on the topic of get_integral_result for "std::array-like" hash values, a few comments:
- It requires the static size of 8 regardless of the size of the requested integer, so even if uint32_t is requested it will still require at least std::array
. This should not be necessary. - It ignores the data beyond the initial 8 bytes. It might be ok for the purpose of slicing the hash value, but it is inconsistent with get_integral_result for integers, which attempts to mix bits through multiplications with a constant. I think, the implementation should be consistent between arrays and integers. The approach of universally mixing bits for all hash value types seems safer as it produces better results for hash values with non-uniform distribution of high-quality bits across the hash value. It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing. So I think, get_integral_result and HashAlgorithm::result_type should be better defined by the library. It should be possible for the user to provide his custom type as HashAlgorithm::result_type, and for that (a) there must be a list of requirements for that type to satisfy, and (b) it should be clear how an integer will be derived from it by get_integral_result. Maybe the user should be able to define his own get_integral_result behavior for his digest type. The reason why I think user-defined hash value types are useful is type safety, as it may be desirable to associate the digest with the algorithm that produced it.
Also, since I already suggested a few renames in my review, I think renaming result_type to digest_type would make sense. result_type is traditionally used as the result type of a function object, which HashAlgorithm isn't. And the result of a hash function is commonly called a digest.