
On 12/9/24 19:43, Peter Dimov via Boost wrote:
What's important here is that it's not possible to provide an extended result of better quality from the outside; the hash algorithm is in the best place to provide it because it has access to more bits of internal state than it lets out.
This requirement effectively mandates that all _hash algorithms_ be _extendable-output hash functions_:
Only some hash functions are specified as extendable-output functions (XOF). I mean "specified" as "in hash algorithm specification". The link you posted says XOF is an extension and even lists a few examples of functions that support it. The fact that you can implement some hash functions such that the implementation allows multiple finalization calls or even interleave updates and finalization steps does not make that hash function a XOF. That just a property of your particular implementation. A useful property, but still beyond specification. A different implementation may rightfully not support this property and be still compliant with the spec. In my opinion, HashAlgorithm must support the latter implementation that is compliant with the hash function specification and does not support the digest extension. If you want to expose the XOF capability then please create a separate concept, say HashAlgorithmXOF, and add a way to detect whether a given algorithm supports result extension. I'll add that XOF is supported by some implementations, but they are also incompatible with the current HashAlgorithm concept. For example, OpenSSL provides EVP_DigestFinalXOF, but it must also be called only once. The difference from EVP_DigestFinal_ex is that EVP_DigestFinalXOF accepts the size of the buffer to will with the digest. If you are going to define HashAlgorithmXOF, please take existing implementations of this feature into account.
Note that this is not the only innovation that the proposed hash algorithm concept involves. All hash algorithms are required to support seeding from uint64_t and from an arbitrary sequence of bytes, which makes them effectively _keyed hash functions_ (or _message authentication codes_).
Also note that the requirement that one can interleave calls to `update` and `result` arbitrarily makes it possible to implement byte sequence seeding (for algorithms that don't already support it) in the following manner:
Hash::Hash( unsigned char const* p, size_t n ): Hash() { if( n != 0 ) { update( p, n ); result(); } }
Subsequent `update` calls now start from an initial internal state that has incorporated the contents of [p, p+n), and that has been "finalized" (scrambled thoroughly) such that the result is not equivalent to just prepending the seed to the message (as would have happened if the result() call has been omitted.)
The exact behavior of the hash algorithm's constructor is its implementation details. It doesn't need to be specified in terms of public update and result methods. And certainly, that one hash algorithm supports this sort of operation ordering doesn't mean that all of them should.