
Vinnie Falco asked me the following on Slack:
Calling result() twice (or more times) provides result extension; the ability to extract variable number of bits from a hash algorithm, instead of a fixed size value (e.g. 64 bit.) This is in fact stated in the docs here https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_bytes_result
and there is an example of doing that here https://pdimov.github.io/hash2/doc/html/hash2.html#example_result_extension All hash algorithms are required to support result extension, because (in my opinion) this is extremely useful functionality that is easy - even trivial - to provide, but is often withheld either by accident or in some cases, even deliberately. Hash algorithms typically have a "finalization" phase that pads the message, mixes the length, scrambles the internal state in a more thorough manner than in `update`, and then derives a hash value from that state. (The hash value is often shorter than the total amount of state.) If this "finalization" phase is performed more than once, one naturally gets the mandated `result()` behavior. Falco continues:
That's of course correct, but it also applies to the quality of calling `result()` only once; it's naturally dependent on the implementation of the hash algorithm. What's important here is that it's not possible to provide an extended result of better quality from the outside; the hash algorithm is in the best place to provide it because it has access to more bits of internal state than it lets out. This requirement effectively mandates that all _hash algorithms_ be _extendable-output hash functions_: https://en.wikipedia.org/wiki/Extendable-output_function Note that this is not the only innovation that the proposed hash algorithm concept involves. All hash algorithms are required to support seeding from uint64_t and from an arbitrary sequence of bytes, which makes them effectively _keyed hash functions_ (or _message authentication codes_). Also note that the requirement that one can interleave calls to `update` and `result` arbitrarily makes it possible to implement byte sequence seeding (for algorithms that don't already support it) in the following manner: Hash::Hash( unsigned char const* p, size_t n ): Hash() { if( n != 0 ) { update( p, n ); result(); } } Subsequent `update` calls now start from an initial internal state that has incorporated the contents of [p, p+n), and that has been "finalized" (scrambled thoroughly) such that the result is not equivalent to just prepending the seed to the message (as would have happened if the result() call has been omitted.)

Vinnie Falco asked me the following on Slack:
This is in fact stated in the docs here
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_bytes_result
and there is an example of doing that here
https://pdimov.github.io/hash2/doc/html/hash2.html#example_result_extension
If this "finalization" phase is performed more than once, one naturally gets the mandated `result()` behavior.
Falco continues:
This requirement effectively mandates that all hash algorithms be extendable-output hash functions:
For those unfamiliar with Extendable Output Functions (XOFs) FIPS 202 [1], and the reference implementation [2] provide good detail since the wiki article seems a bit short. Matt [1] https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf [2] https://github.com/XKCP/XKCP/blob/master/usage-example.md

On Mon, Dec 9, 2024 at 8:43 AM Peter Dimov via Boost
It seems to me that Hash2 is primarily aimed at users who wish to opt-in to a superior framework for use in unordered style containers. For this use-case, hash algorithms only need the digest finalized once. All of the work in the paper is focused on this use-case. What you have done, in your own words, is innovative. From the documentation:
This obviously is out of scope for unordered containers. I think it would be better to change HashAlgorithm to only allow one call to finalize() (and rename it from "result"), and no calls to update() after finalize, using the language or similar language to what I have already posted. And there is nothing inherently wrong with innovating, therefore I would propose that you simply add a new concept: ExtendableOutputHashAlgorithm And in this new concept impose your additional requirements. Most users won't care about extendable output and can just stick with the elements of the library needed for working with unordered containers. Those users will find it easy to adapt external implementations of hash algorithms, as they do not need to also have the expertise for ensuring that these external implementations meet the more stringent requirements of ExtendableOutputHashAlgorithm. There is an added benefit of relaxing the requirements for HashAlgorithm: users can call into existing algorithms without modifying them, often a requirement for security certifications. Thanks

On Mon, Dec 9, 2024 at 9:28 AM Peter Dimov
"seed" "message" and "seedm" "essage" trivially collide, which may not be desirable.
Sorry, I don't follow. You are comparing the result of two different hash functions: H1 h1( "seed", 4 ); h1.update( "message", 7 ); auto r1 = h1.result(); H2 h2( "seedm", 5 ); h2.update( "essage", 6 ); auto r2 = h2.result(); assert( r1 == r2 ); h1 and h2 are different hash functions from the same family [1] and you have gone out of your way to form a preimage utilizing knowledge of the seed. Hardly fair and also does not in any way reflect the security concerns of real use-cases. Your own implementation of the seeded FNV-1a constructor simply calls update(): https://github.com/pdimov/hash2/blob/7a25f8518692b657e9272884519519fbaca2ec5... I think it would be preferable if Hash2 made this the default implementation, and allowed more sophisticated algorithms (and authors) to opt-in to a better implementation. Otherwise, we are forcing ordinary users who just want to adapt an external library to meet the HashAlgorithm requirements to become experts. [1] https://en.wikipedia.org/wiki/Universal_hashing [2] https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

On Mon, Dec 9, 2024 at 9:41 AM Vinnie Falco
Peter, would you please comment on the following sketch:
/** HashAlgorithm exemplar
*/
struct HashAlgorithm
{
using result_type = /* integral or array-like */;
HashAlgorithm();
HashAlgorithm( HashAlgorithm const& r );
HashAlgorithm& operator=( HashAlgorithm const& r );
/** Append data to the message
The behavior of subsequent calls to update() after
finalize() has been called at least once is undefined
unless specified by the HashAlgorithm.
*/
void update( void const* data, std::size_t n );
/** Return the digest
This function shall return the final hash value of the input
message, where the input message is defined by the ordered
sequence of bytes provided in all prior calls to update().
The behavior of subsequent calls to finalize() is undefined
unless specified by the HashAlgorithm.
*/
result_type finalize();
static constexpr int block_size = /*...*/; // optional
explicit HashAlgorithm( std::uint64_t seed ); // optional
HashAlgorithm( unsigned char const* seed, std::size_t n ); //
optional
};
/** Return a seeded HashAlgorithm
*/
template< class HashAlgorithm >
HashAlgorithm make_seeded( std::uint64_t seed )
{
if constexpr(std::is_constructible<
HashAlgorithm, std::uint64_t>)
return HashAlgorithm(seed);
else
{
HashAlgorithm h;
hash_append(h, seed);
return h;
}
}
/** Return a seeded HashAlgorithm
*/
template< class HashAlgorithm >
HashAlgorithm make_seeded(
unsigned char const* seed, std::size_t n )
{
if constexpr(std::is_constructible

On 12/9/24 19:43, Peter Dimov via Boost wrote:
Only some hash functions are specified as extendable-output functions (XOF). I mean "specified" as "in hash algorithm specification". The link you posted says XOF is an extension and even lists a few examples of functions that support it. The fact that you can implement some hash functions such that the implementation allows multiple finalization calls or even interleave updates and finalization steps does not make that hash function a XOF. That just a property of your particular implementation. A useful property, but still beyond specification. A different implementation may rightfully not support this property and be still compliant with the spec. In my opinion, HashAlgorithm must support the latter implementation that is compliant with the hash function specification and does not support the digest extension. If you want to expose the XOF capability then please create a separate concept, say HashAlgorithmXOF, and add a way to detect whether a given algorithm supports result extension. I'll add that XOF is supported by some implementations, but they are also incompatible with the current HashAlgorithm concept. For example, OpenSSL provides EVP_DigestFinalXOF, but it must also be called only once. The difference from EVP_DigestFinal_ex is that EVP_DigestFinalXOF accepts the size of the buffer to will with the digest. If you are going to define HashAlgorithmXOF, please take existing implementations of this feature into account.
The exact behavior of the hash algorithm's constructor is its implementation details. It doesn't need to be specified in terms of public update and result methods. And certainly, that one hash algorithm supports this sort of operation ordering doesn't mean that all of them should.

On 12/9/24 19:43, Peter Dimov via Boost wrote:
Also, my understanding of HMAC[1] is that the key is prepended to the subsequent data and then the whole data is hashed. This contradicts with your code calling result() in the middle. Am I missing something? [1]: https://datatracker.ietf.org/doc/html/rfc2104#section-2

Andrey Semashev wrote:
I don't understand what contradicts what. HMAC is a separate algorithm. https://pdimov.github.io/hash2/doc/html/hash2.html#ref_hmac

On Mon, Dec 9, 2024 at 8:43 AM Peter Dimov via Boost
...there is an example of doing that here
https://pdimov.github.io/hash2/doc/html/hash2.html#example_result_extension
I will argue that for all the cases where result extension is a valid use-case, that the caller is already committed to a specific hash algorithm. And in this case the hash algorithm can specify additional behavior for update() and result() with no reduction in the usability of Hash2. Thanks

Vinnie Falco wrote:
In other words, no generic code should be able to rely on result extension. I don't think this is an improvement. It's much more ergonomic for end users to be able to rely on all hash algorithm features, so that code can be generic without fear of invoking undefined behavior. That's also why all algorithms are required to support specific forms of seed construction, instead of every algorithm providing its own random constructors (even though this better reflects what happens today). The current hash algorithm requirements reflect my view on what minimum functionality a hash algorithm should provide in 2025. It's true that this doesn't correspond to what traditionally has been provided.
participants (4)
-
Andrey Semashev
-
Matt Borland
-
Peter Dimov
-
Vinnie Falco