
Hi all, This is my review of the proposed Boost.Hash2. Thanks Peter and Chris for submitting the library, and Matt for managing the review. - What is your evaluation of the design? Very good. This is how hashing C++ objects should work, as it properly separates concerns between types and hash functions. I think usability could be improved if the template class hash shown as an example was included as part of the library and introduced much earlier in the documentation. In general, the HashAlgorithm concept makes sense. I'm not convinced about the HashAlgorithm::result() interface, for two reasons: * It makes integration of third-party hashes more difficult, as already noted by other reviewers. * The naming suggests an accessor, rather than a modifier. I'd consider splitting it into two functions, maybe finalize_block() and result(). But I can also see the benefits of the approach taken. I miss some higher-level features when working with untyped byte sequences. While this doesn't seem the main value proposition of the library, it's listed as a valid use case. See below for more info. These are nice-to-haves rather than requirements, though. I was initially surprised to find out that the library was header-only, as I didn't know that compile-time hashing was something worth it. I was initially worried about compile-times. I built one of the examples with compile-time profiling (-ftime-trace) and the library is pretty lightweight, so I have no problems with the header-only design. Have you considered somehow supporting hashing for objects describable via Boost.Pfr? I had it requested by some users in MySQL even when I had built-in Boost.Describe support, so it may get requested here, too. - What is your evaluation of the implementation? The parts where I've peaked in are good. Some comments would have helped me, but in general I've been able to follow. The test suite looks pretty comprehensive, too. Some fuzz tests for the cryptographic algorithms would be appreciated (I can help if required). As other reviewers have commented, the algorithm for hashing unordered ranges can use some improvements to make it more secure. I'm no security expert and can't judge how important this is, but I'm pretty sure the author will solve this promptly. The implementation of cryptographic functions is currently slower than solutions like OpenSSL [4], but the author has manifested the possibility of closing this gap. It would be nice having HashAlgorithm modelled as a C++20 concept, although the required macro machinery required to perform the checking only if concepts are supported (like BOOST_ASIO_COMPLETION_TOKEN_FOR does) may be too much effort for the benefit it brings. Regarding Boost integration, a build.jam should likely be added, so the library integrates in the new modular Boost infrastructure (although I've been able to use the library fine without it). This can be done after acceptance. As a matter of curiosity: which compilers do not support __builtin_is_constant_evaluated? (as they will always get the slow memcpy implementation, even at runtime). - What is your evaluation of the documentation? The documentation is good, and explains rationales pretty well, but the order could be improved: * Most users will likely want to use the existing algorithms with their unordered containers, so this should be explained first. * The next block of users will likely want to use the existing cryptographic hash algorithms to untyped byte sequences, so this should be next. * The rest can be left as is. I miss the motivation for the flavor construct in the docs. If I understood correctly, it's only relevant if you plan to somehow serialize your hashes. Maybe an example could help. The library is also suitable for use cases that combine the C++ object hashing and untyped byte sequence hashing parts, as explained by the author here: https://lists.boost.org/Archives/boost/2024/12/258765.php. Such cases are the rationale behind having cryptographic functions within the same umbrella as non-cryptographic ones. I'd suggest adding these to the docs. In general, I dislike single-page docs (which unfortunately seem pretty popular in Boost), as they're difficult to search. I also miss linking (e.g. hash_append should contain a link to its description in the reference). It would also be beneficial to explain the library's relationship with boost::hash / Boost.ContainerHash. In general, making it more approachable for non-expert users would be good. - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? Robust, extensible C++ object hashing is very useful. My immediate use case is related to hashing untyped byte sequences, though. MySQL/MariaDB authentication involve a challenge-response protocol that involves hashing the user-supplied password with a server-supplied challenge, using either SHA1 (sigh) or SHA256 [1]. I'm currently using OpenSSL, as I use it for TLS, too. This library would allow me to make the OpenSSL dependency optional (which I've been requested). I'm also planning on writing a Postgres protocol library. It will likely need to support SCRAM-SHA256 authentication, similar to what I've described. This library could be handy for this. - Did you try to use the library? With which compiler(s)? Did you have any problems? I've re-written the password hashing functionality in Boost.MySQL to use the proposed library [2]. Everything has been very smooth. The test suite passes for all the compilers I support [3], which is really good. I'd have liked some utilities on top of HashAlgorithm to make hashing byte sequences safer and easier. These utilities would *not* imply any change in the HashAlgorithm interface: * Adding a contiguous range of bytes to an algorithm currently supports only a pointer + size pair. This is the right choice for HashAlgorithm, but having a helper function on top of it would be great, as proposed by other reviewers. This would make my code go from: // password is a string_view hash2::sha1_160 hasher1; hasher1.update(password.data(), password.size()); auto password_sha1 = hasher1.result(); To: hash2::sha1_160 hasher1; hash2::hash_add_bytes(hasher1, password); auto password_sha1 = hasher1.result(); * Adding a function that computes the hash of an entire byte sequence, as a range, could be useful. OpenSSL has this as SHA1() and SHA256(). This would make my code go from: hash2::sha1_160 hasher1; hasher1.update(password.data(), password.size()); auto password_sha1 = hasher1.result(); To: auto password_sha1 = hash2::compute_hash(hash2::sha1_160(), password); These are just nice to have and trivially implemented in user-land, if required. The code is already good as is today. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I have dedicated around 8 hours to study the documentation, build the library and examples, integrate it into Boost.MySQL, update my CIs to run my tests with the library, and write this review. - Are you knowledgeable about the problem domain? I am a mere user of hash algorithms, as I usually write networking code. I don't know about their internals. I haven't authored associative containers, either. - Affiliation disclosure I am currently affiliated with the C++ Alliance. - Final decision My recommendation is to CONDITIONALLY ACCEPT the library into Boost. My only acceptance condition is adding the hash template class developed in the examples to the library. The benefit of having the library in Boost is big and overcomes any concerns I might have. Many thanks to both Peter and Matt. Kind regards, Ruben. [1] https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_caching_sha2_authentic... [2] https://github.com/boostorg/mysql/pull/389 [3] https://github.com/boostorg/mysql/blob/c438f26731e36c2db6457705ec5dbb9f7657d... [4] https://lists.boost.org/Archives/boost/2024/12/258687.php

Ruben Perez wrote:
As a matter of curiosity: which compilers do not support __builtin_is_constant_evaluated? (as they will always get the slow memcpy implementation, even at runtime).
__builtin_is_constant_evaluated is supported by * Clang 9 and above (https://godbolt.org/z/4obMEj9fc) * GCC 9 and above (https://godbolt.org/z/5PdMsqezx) * MSVC 19.25+ (VS2017.something)

On Fri, Dec 13, 2024 at 9:42 AM Ruben Perez via Boost
I'd consider splitting it into two functions, maybe finalize_block() and result().
I presume your proposed result() is idempotent. Unfortunately this increases the size of the hash algorithm object for many algorithms. Thanks

On Fri, 13 Dec 2024 at 18:56, Vinnie Falco
On Fri, Dec 13, 2024 at 9:42 AM Ruben Perez via Boost
wrote: I'd consider splitting it into two functions, maybe finalize_block() and result().
I presume your proposed result() is idempotent. Unfortunately this increases the size of the hash algorithm object for many algorithms.
Thanks
Yes, that was the case. If there's a strong reason not to do it, I'm fine with result() as is.
participants (4)
-
Matt Borland
-
Peter Dimov
-
Ruben Perez
-
Vinnie Falco