New subject: [hash2] My Review

12 Dec 2024

      On Sat, Dec 7, 2024 at 5:47 AM Matt Borland via Boost <boost@lists.boost.org>
wrote:
...
The review of Hash2 by Peter Dimov and Christian Mazakas begins today
Saturday, December 7th through Sunday December 15th, 2024.
Code: https://github.com/pdimov/hash2
Docs: https://pdimov.github.io/hash2/
Disclaimer: I am affiliated with The C++ Alliance and I am also a co-author
of N3980 (the paper referenced by the library).

This is my review of Hash2. Spoiler alert, my suggestion is to:

ACCEPT

While most of my review is dedicated to discussing some concerns I have
about some aspects of the library, it is clear that Boost as a collection
is better with Hash2 than without it. Also, Peter is going to do what Peter
is going to do so why fight it.

The uncontroversial parts are stated first, and after lies my assessment of
some questionable design choices.
...
- What is your evaluation of the implementation?
I looked over some of the implementation to better understand how certain
things work. The tests pass on CI and the test code looks comprehensive.
Peter is not in the habit of producing anything less than high-quality
implementations and this library is no exception. I don't know which parts
are from Peter and which are from Christian and the quality appears high
throughout.
...
- What is your evaluation of the documentation?
I think the documentation could be better in places. The named requirements
exposition is too informal for my taste and talks about things that are not
strictly relevant, such as zero-padding. The library alludes to
cryptographic usage yet glosses over some details which surfaced during
mailing list discussion (more on that below). It would be nice if there was
some exposition up front explaining which groups of users the library is
for, and how each group uses the library.
...
- What is your evaluation of the potential usefulness
of the library? Do you already use it in industry?
Not only is the library useful, but the C++ Standard is poorer for not
having a modern hash solution. Competing approaches were developed by
Google and N3980, then promptly abandoned.
...
- Did you try to use the library? With which compiler(s)? Did
you have any problems?
I tried to build a Visual Studio Solution from the CMakeLists.txt after
cloning hash2 into my superproject, with this command line:

cmake -G "Visual Studio 17 2022" -A x64 -B bin64
-DCMAKE_TOOLCHAIN_FILE="C:/vcpkg/scripts/buildsystems/vcpkg.cmake"
-DVCPKG_CHAINLOAD_TOOLCHAIN_FILE="C:/users/vinnie/src/toolchains/msvc.cmake"

It started downloading things from the internet (FetchContent I guess) and
then failed. I was not able to generate a solution. I asked Peter about
this and he said it was not supported. It would be nice if it was.
...
- How much effort did you put into your evaluation?
A glance? A quick reading? In-depth study?
...
I focused primarily on the ways in which the library differs from the
original paper N3980. I looked at the relevant documentation, interfaces,
and implementation, and engaged the author on the mailing list.

- Are you knowledgeable about the problem domain?

Fairly knowledgeable.

- What is your evaluation of the design?

This is where I have concerns. The audience of users for Hash2 can be
divided into roughly these groups:

1. Authors of types who just want to make objects of their type hashable
without having to learn about hash algorithms cryptography, and users who
want higher quality results when working with unordered containers. Hash2
is basically perfect for these two groups of users (which often overlap)
although the documentation could be improved to speak more directly to them.

2. Users who wish to calculate a cryptographic digest for objects of
Hash2-enabled types, or for untyped binary data. Hash2 is also almost
perfect for these use cases, and as before the documentation could be more
explicit about these cases.

3. Users who wish to adapt existing code to meet the HashAlgorithm
requirements from Hash2. This is where I believe the library has some
design problems.

An unfortunate consequence of being right almost all the time, is that it
becomes more unlikely to recognize when you are wrong. I will assume that
the reader is at least somewhat familiar with N3980 and some of the
arguments I made about the HashAlgorithm named requirements on the mailing
list.

Hash2 attempts to improve the quality of life for users by mandating
additional requirements for the HashAlgorithm concept which may be
difficult or impossible to meet for some existing libraries. For example,
it requires that every HashAlgorithm support:

* Seeded construction from std::uint64_t
* Keyed construction from binary data
* Copy construction
* Digest extension from repeated calls to result()
* Interleaving of calls to update() and result()
* ...Future stuff that the author alludes to

These issues above are already discussed extensively on the mailing list
and I can't be bothered to spend hours putting it all together into a
beautiful post the way that some of the other reviewers have done (Andrey
Semashev and Samuel Neves come to mind).

The burdensome additional named requirements of Hash2's HashAlgorithm
ensure that no casual users will be authoring them. Or worse, they will get
them wrong and the results will not have the same guarantees as the
underlying algorithms.

Right now there are approximately 1,649,127 implementations of
cryptographic and non-cryptographic hash algorithms. I think it might be
more prudent for Hash2 to offer a system of traits for identifying which
algorithms are capable of what, such as CopyConstruction, digest extension,
and so on, and give the user tools which only work in safe ways given the
traits of a hash algorithm. This would make it easier to write
HashAlgorithms. And as a consequence we might get many more of them rather
than fewer.

Realistically I don't expect the library to change, and a proliferation of
user-created HashAlgorithms is probably conditional on the Boost library
collection being much more popular than it currently is. This is partly why
I have decided to just accept the library. And I could be wrong, and
Peter's unwavering insistence that users get HashAlgorithms which all offer
the same quality of service might turn out to be better. Time will tell.

Thanks

Re: [boost] [hash2] My Review

Vinnie Falco

Matt Borland

tags

participants (2)