Re: [boost] Formal review request for Hash2

4 Dec 2024

      ...
...
2) I am hashing one stream of bytes, but I do not have them all at the
moment so I am passing it to hasher as they arrive(e.g. receiving long
message over tcp, but hashing it as we get parts to minimize latency of
computing hash after entire message is received)
The digest of a single binary blob should be calculated as-if first
submitting all the bytes to the hash function sequentially, and then
submitting the size of the blob in bytes as std::size_t. I don't think this
can be done through the type hashing interface and has to repeatedly call
instead the function which takes a void pointer and size. And at the end of
that, hashing a value of type `std::size_t`.
I think this is a valid concern as it might be a common use case.
IMO this should be considered in the interface and/or at least be 
covered in the examples of the documentation.
I.e. what to do with code like this:
...
span<byte> buffer;
while(connection.readsome(buffer)) {
  update_hash(buffer);
}
hash1 = hash_result()
buffer = connection.readall()
update_hash(buffer)
hash2 = hash_result()
assert(hash1 == hash2)
...
On the other hand, it makes it trivial to generate collisions
pair<string, string>( "foo", "bar" )
pair<string, string>( "foob", "ar" )
pair<string, string>( "fooba", "r" )
I can imagine an argument that the "collision" is intentional here, i.e.
I.e. the result should be independent of the size of the "partial" 
buffers which currently isn't the case as each call appends the size.
Keeping track of the total size on the call-site might also be 
error-prone. Maybe this could be done internally by providing an 
interface that keeps the size as state.

that the `data` really is just "foobarfoobarfoobar"

So no matter which way is used in the end, it might be surprising to 
some people.

Re: [boost] Formal review request for Hash2

Alexander Grund