Ruben's Hash2 review

13 Dec 2024

      Hi all,

This is my review of the proposed Boost.Hash2. Thanks Peter and Chris
for submitting the library, and Matt for managing the review.

- What is your evaluation of the design?

Very good. This is how hashing C++ objects should work, as it properly
separates concerns between types and hash functions. I think usability
could be improved if the template class hash shown as an example was
included as part of the library and introduced much earlier in the
documentation.

In general, the HashAlgorithm concept makes sense. I'm not convinced
about the HashAlgorithm::result() interface, for two reasons:

* It makes integration of third-party hashes more difficult, as
already noted by other reviewers.
* The naming suggests an accessor, rather than a modifier. I'd
consider splitting it into two functions, maybe finalize_block() and
result().

But I can also see the benefits of the approach taken.

I miss some higher-level features when working with untyped byte
sequences. While this doesn't seem the main value proposition of the
library, it's listed as a valid use case. See below for more info.
These are nice-to-haves rather than requirements, though.

I was initially surprised to find out that the library was
header-only, as I didn't know that compile-time hashing was something
worth it. I was initially worried about compile-times. I built one of
the examples with compile-time profiling (-ftime-trace) and the
library is pretty lightweight, so I have no problems with the
header-only design.

Have you considered somehow supporting hashing for objects describable
via Boost.Pfr? I had it requested by some users in MySQL even when I
had built-in Boost.Describe support, so it may get requested here,
too.

- What is your evaluation of the implementation?

The parts where I've peaked in are good. Some comments would have
helped me, but in general I've been able to follow.

The test suite looks pretty comprehensive, too. Some fuzz tests for
the cryptographic algorithms would be appreciated (I can help if
required).

As other reviewers have commented, the algorithm for hashing unordered
ranges can use some improvements to make it more secure. I'm no
security expert and can't judge how important this is, but I'm pretty
sure the author will solve this promptly.

The implementation of cryptographic functions is currently slower than
solutions like OpenSSL [4], but the author has manifested the
possibility of closing this gap.

It would be nice having HashAlgorithm modelled as a C++20 concept,
although the required macro machinery required to perform the checking
only if concepts are supported (like BOOST_ASIO_COMPLETION_TOKEN_FOR
does) may be too much effort for the benefit it brings.

Regarding Boost integration, a build.jam should likely be added, so
the library integrates in the new modular Boost infrastructure
(although I've been able to use the library fine without it). This can
be done after acceptance.

As a matter of curiosity: which compilers do not support
__builtin_is_constant_evaluated? (as they will always get the slow
memcpy implementation, even at runtime).

- What is your evaluation of the documentation?

The documentation is good, and explains rationales pretty well, but
the order could be improved:

* Most users will likely want to use the existing algorithms with
their unordered containers, so this should be explained first.
* The next block of users will likely want to use the existing
cryptographic hash algorithms to untyped byte sequences, so this
should be next.
* The rest can be left as is.

I miss the motivation for the flavor construct in the docs. If I
understood correctly, it's only relevant if you plan to somehow
serialize your hashes. Maybe an example could help.

The library is also suitable for use cases that combine the C++ object
hashing and untyped byte sequence hashing parts, as explained by the
author here: https://lists.boost.org/Archives/boost/2024/12/258765.php.
Such cases are the rationale behind having cryptographic functions
within the same umbrella as non-cryptographic ones. I'd suggest adding
these to the docs.

In general, I dislike single-page docs (which unfortunately seem
pretty popular in Boost), as they're difficult to search. I also miss
linking (e.g. hash_append should contain a link to its description in
the reference).

It would also be beneficial to explain the library's relationship with
boost::hash / Boost.ContainerHash. In general, making it more
approachable for non-expert users would be good.

- What is your evaluation of the potential usefulness
of the library? Do you already use it in industry?

Robust, extensible C++ object hashing is very useful. My immediate use
case is related to hashing untyped byte sequences, though.

MySQL/MariaDB authentication involve a challenge-response protocol
that involves hashing the user-supplied password with a
server-supplied challenge, using either SHA1 (sigh) or SHA256 [1].

I'm currently using OpenSSL, as I use it for TLS, too. This library
would allow me to make the OpenSSL dependency optional (which I've
been requested). I'm also planning on writing a Postgres protocol
library. It will likely need to support SCRAM-SHA256 authentication,
similar to what I've described. This library could be handy for this.

- Did you try to use the library? With which compiler(s)? Did
you have any problems?

I've re-written the password hashing functionality in Boost.MySQL to
use the proposed library [2]. Everything has been very smooth. The
test suite passes for all the compilers I support [3], which is really
good.

I'd have liked some utilities on top of HashAlgorithm to make hashing
byte sequences safer and easier. These utilities would *not* imply any
change in the HashAlgorithm interface:

* Adding a contiguous range of bytes to an algorithm currently
supports only a pointer + size pair. This is the right choice for
HashAlgorithm, but having a helper function on top of it would be
great, as proposed by other reviewers. This would make my code go
from:

// password is a string_view
hash2::sha1_160 hasher1;
hasher1.update(password.data(), password.size());
auto password_sha1 = hasher1.result();

To:

hash2::sha1_160 hasher1;
hash2::hash_add_bytes(hasher1, password);
auto password_sha1 = hasher1.result();

* Adding a function that computes the hash of an entire byte sequence,
as a range, could be useful. OpenSSL has this as SHA1() and SHA256().
This would make my code go from:

hash2::sha1_160 hasher1;
hasher1.update(password.data(), password.size());
auto password_sha1 = hasher1.result();

To:

auto password_sha1 = hash2::compute_hash(hash2::sha1_160(), password);

These are just nice to have and trivially implemented in user-land, if
required. The code is already good as is today.

- How much effort did you put into your evaluation?
A glance? A quick reading? In-depth study?

I have dedicated around 8 hours to study the documentation, build the
library and examples, integrate it into Boost.MySQL, update my CIs to
run my tests with the library, and write this review.

- Are you knowledgeable about the problem domain?

I am a mere user of hash algorithms, as I usually write networking
code. I don't know about their internals. I haven't authored
associative containers, either.

- Affiliation disclosure

I am currently affiliated with the C++ Alliance.

- Final decision

My recommendation is to CONDITIONALLY ACCEPT the library into Boost.
My only acceptance condition is adding the hash template class
developed in the examples to the library. The benefit of having the
library in Boost is big and overcomes any concerns I might have.

Many thanks to both Peter and Matt.

Kind regards,
Ruben.

[1] https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_caching_sha2_authentic...
[2] https://github.com/boostorg/mysql/pull/389
[3] https://github.com/boostorg/mysql/blob/c438f26731e36c2db6457705ec5dbb9f7657d...
[4] https://lists.boost.org/Archives/boost/2024/12/258687.php

Ruben Perez

Peter Dimov

Vinnie Falco

Ruben Perez

Matt Borland

tags

participants (4)