
On Sat, Dec 7, 2024 at 5:47 AM Matt Borland via Boost
The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.
Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/
Disclaimer: I am affiliated with The C++ Alliance and I am also a co-author of N3980 (the paper referenced by the library). This is my review of Hash2. Spoiler alert, my suggestion is to: ACCEPT While most of my review is dedicated to discussing some concerns I have about some aspects of the library, it is clear that Boost as a collection is better with Hash2 than without it. Also, Peter is going to do what Peter is going to do so why fight it. The uncontroversial parts are stated first, and after lies my assessment of some questionable design choices.
- What is your evaluation of the implementation?
I looked over some of the implementation to better understand how certain things work. The tests pass on CI and the test code looks comprehensive. Peter is not in the habit of producing anything less than high-quality implementations and this library is no exception. I don't know which parts are from Peter and which are from Christian and the quality appears high throughout.
- What is your evaluation of the documentation?
I think the documentation could be better in places. The named requirements exposition is too informal for my taste and talks about things that are not strictly relevant, such as zero-padding. The library alludes to cryptographic usage yet glosses over some details which surfaced during mailing list discussion (more on that below). It would be nice if there was some exposition up front explaining which groups of users the library is for, and how each group uses the library.
- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?
Not only is the library useful, but the C++ Standard is poorer for not having a modern hash solution. Competing approaches were developed by Google and N3980, then promptly abandoned.
- Did you try to use the library? With which compiler(s)? Did you have any problems?
I tried to build a Visual Studio Solution from the CMakeLists.txt after cloning hash2 into my superproject, with this command line: cmake -G "Visual Studio 17 2022" -A x64 -B bin64 -DCMAKE_TOOLCHAIN_FILE="C:/vcpkg/scripts/buildsystems/vcpkg.cmake" -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE="C:/users/vinnie/src/toolchains/msvc.cmake" It started downloading things from the internet (FetchContent I guess) and then failed. I was not able to generate a solution. I asked Peter about this and he said it was not supported. It would be nice if it was.
- How much effort did you put into your evaluation?
A glance? A quick reading? In-depth study?
I focused primarily on the ways in which the library differs from the original paper N3980. I looked at the relevant documentation, interfaces, and implementation, and engaged the author on the mailing list. - Are you knowledgeable about the problem domain? Fairly knowledgeable. - What is your evaluation of the design? This is where I have concerns. The audience of users for Hash2 can be divided into roughly these groups: 1. Authors of types who just want to make objects of their type hashable without having to learn about hash algorithms cryptography, and users who want higher quality results when working with unordered containers. Hash2 is basically perfect for these two groups of users (which often overlap) although the documentation could be improved to speak more directly to them. 2. Users who wish to calculate a cryptographic digest for objects of Hash2-enabled types, or for untyped binary data. Hash2 is also almost perfect for these use cases, and as before the documentation could be more explicit about these cases. 3. Users who wish to adapt existing code to meet the HashAlgorithm requirements from Hash2. This is where I believe the library has some design problems. An unfortunate consequence of being right almost all the time, is that it becomes more unlikely to recognize when you are wrong. I will assume that the reader is at least somewhat familiar with N3980 and some of the arguments I made about the HashAlgorithm named requirements on the mailing list. Hash2 attempts to improve the quality of life for users by mandating additional requirements for the HashAlgorithm concept which may be difficult or impossible to meet for some existing libraries. For example, it requires that every HashAlgorithm support: * Seeded construction from std::uint64_t * Keyed construction from binary data * Copy construction * Digest extension from repeated calls to result() * Interleaving of calls to update() and result() * ...Future stuff that the author alludes to These issues above are already discussed extensively on the mailing list and I can't be bothered to spend hours putting it all together into a beautiful post the way that some of the other reviewers have done (Andrey Semashev and Samuel Neves come to mind). The burdensome additional named requirements of Hash2's HashAlgorithm ensure that no casual users will be authoring them. Or worse, they will get them wrong and the results will not have the same guarantees as the underlying algorithms. Right now there are approximately 1,649,127 implementations of cryptographic and non-cryptographic hash algorithms. I think it might be more prudent for Hash2 to offer a system of traits for identifying which algorithms are capable of what, such as CopyConstruction, digest extension, and so on, and give the user tools which only work in safe ways given the traits of a hash algorithm. This would make it easier to write HashAlgorithms. And as a consequence we might get many more of them rather than fewer. Realistically I don't expect the library to change, and a proliferation of user-created HashAlgorithms is probably conditional on the Boost library collection being much more popular than it currently is. This is partly why I have decided to just accept the library. And I could be wrong, and Peter's unwavering insistence that users get HashAlgorithms which all offer the same quality of service might turn out to be better. Time will tell. Thanks
participants (2)
-
Matt Borland
-
Vinnie Falco