
2) I am hashing one stream of bytes, but I do not have them all at the moment so I am passing it to hasher as they arrive(e.g. receiving long message over tcp, but hashing it as we get parts to minimize latency of computing hash after entire message is received) The digest of a single binary blob should be calculated as-if first submitting all the bytes to the hash function sequentially, and then submitting the size of the blob in bytes as std::size_t. I don't think this can be done through the type hashing interface and has to repeatedly call instead the function which takes a void pointer and size. And at the end of that, hashing a value of type `std::size_t`. I think this is a valid concern as it might be a common use case. IMO this should be considered in the interface and/or at least be covered in the examples of the documentation.
I.e. what to do with code like this:
span<byte> buffer;
while(connection.readsome(buffer)) { update_hash(buffer); } hash1 = hash_result()
buffer = connection.readall() update_hash(buffer) hash2 = hash_result()
assert(hash1 == hash2)
On the other hand, it makes it trivial to generate collisions
pair
( "foo", "bar" ) pair ( "foob", "ar" ) pair ( "fooba", "r" ) I can imagine an argument that the "collision" is intentional here, i.e.
I.e. the result should be independent of the size of the "partial" buffers which currently isn't the case as each call appends the size. Keeping track of the total size on the call-site might also be error-prone. Maybe this could be done internally by providing an interface that keeps the size as state. that the `data` really is just "foobarfoobarfoobar" So no matter which way is used in the end, it might be surprising to some people.