[hash2] Formal Review Begins

Matt Borland

7 Dec 2024 7 Dec '24

1:41 p.m.

Dear all, The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024. Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/ Please provide feedback on the following general topics: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions). There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com. Matt Borland Review Manager and C++Alliance Staff Engineer

Attachments:

signature.asc (application/pgp-signature — 249 bytes)

Show replies by date

Peter Dimov

7 Dec 7 Dec

2:14 p.m.

Matt Borland wrote:

...

Dear all,

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

I know that many do not want to clone the entire Boost to place the library in libs/hash2, so I've added support for building it with CMake as a standalone project - it uses FetchContent to acquire the necessary Boost dependencies. By default `cmake --build .` builds the benchmark and the examples; if you want to build and run the tests, use cmake --build . --target tests ctest For Windows, add `--config Debug` to the build line and `-C Debug` to the ctest line (or Release.)

Vinnie Falco

8:51 p.m.

On Sat, Dec 7, 2024 at 6:14 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

I know that many do not want to clone the entire Boost to place the library in libs/hash2, so I've added support for building it with CMake as a standalone project - it uses FetchContent to acquire the necessary Boost dependencies.

It is easier for me to put it in libs/hash2. I cloned it there and then ran this command line from hash2: cmake -G "Visual Studio 17 2022" -A x64 -B bin64 -DCMAKE_TOOLCHAIN_FILE="C:/vcpkg/scripts/buildsystems/vcpkg.cmake" -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE="C:/users/vinnie/src/toolchains/msvc.cmake" FetchContent was invoked. How do I build a Visual Studio Solution for hash2 which uses my existing local clone of the superproject? Thanks

Klemens Morgenstern

3:27 p.m.

This is my review of boost hash2. Since I encourage people to write smaller reviews, here's mine. Please weigh it accordingly. I cloned the repo and ran the tests, all worked. I only studied the docs (not the implementation). If there's one person who's implementation I blindly trust, it's Peter. That plus writing the review took about an hour. The choice of hash algorithms looks correct to me.. HashAlgorithm should allow me to plugin any other algorithm, e.g. from openssl. I also think that update(void*,size_t) is the correct interface, because it doesn't require a reinterpret_cast to an inappropriate type, nor does it create any dependency. The built-in algorithms for hashing are a good choice although I wonder about the usefulness of pointers. A pointer doesn't really tell if it's itself information or points to it (what do you hash for const char*?) I would limit it to void* and function pointers. However, many other hash libraries have it built-in too. Following precedent here seems ok to me. Another example that I would be interested in is how one would hash a `boost::json::value`. If it's just calling `hash_append_unordered_range`, can recursion become an issue? If so, how do I address it? I think this library does solve multiple issues with std::hash/boost.hash:` 1. having an actual hash algorithm 2. ability to choose a hash algorithm 3. convenient extensibility 4. endian awareness I think (1) is the most important, because with std::hash everyone needing his class to be hashed must know how to implement/combine hashes without knowing which hash algorithm is used under neath. By using an established algorithm and just being required to provide the bytes to be hashed, the user does not need to know anything about hashing algorithms. It is also very useful as is, e.g. I can drop it into a project using unordered_map by just replacing the `Hash` parameter. I think the design of the library is correct and recommend to ACCEPT this library into boost. On Sat, Dec 7, 2024 at 9:47 PM Matt Borland via Boost <boost@lists.boost.org> wrote:

...

Dear all,

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

Please provide feedback on the following general topics:

- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

Matt Borland Review Manager and C++Alliance Staff Engineer

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Matt Borland

3:33 p.m.

...

This is my review of boost hash2.

...

I think the design of the library is correct and recommend to ACCEPT this library into boost.

Thank you Klemens for your review. Matt

Peter Dimov

4:16 p.m.

Klemens Morgenstern wrote:

...

Another example that I would be interested in is how one would hash a `boost::json::value`.

Good suggestion, thanks. In theory, this is how one would do it: namespace boost { namespace json { template<class Hash, class Flavor> void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, boost::json::value const& v ) { boost::json::visit( [&](auto const& w){ boost::hash2::hash_append( h, f, w ); }, v ); } } // namespace json } // namespace boost Unfortunately though, this doesn't work as-is for subtle reasons, and I wonder whether I need to change the tag_invoke signature somewhat so that I can prevent these errors from happening. The problem here is that boost::json::value is implicitly constructible from many things, which makes the above tag_invoke overload also match those many things in addition to json::value. (We want it to only match json::value and nothing else.) The correct way to write the tag_invoke overload today is namespace boost { namespace json { template<class Hash, class Flavor, class V> std::enable_if_t< std::is_same<V, boost::json::value>::value > tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, V const& v ) { boost::json::visit( [&](auto const& w){ boost::hash2::hash_append( h, f, w ); }, v ); } } // namespace json } // namespace boost The other subtlety here is that boost::json::object should be considered an unordered range, rather than an ordinary range. It doesn't expose a `hasher` typedef, which is the heuristic is_unordered_range uses, but it already specializes is_unordered_range here: https://github.com/boostorg/json/blob/7f0bceb81280df758c7b4a7cacf8779351ebcc... so we're covered.

Peter Dimov

4:24 p.m.

Oops, forgot to include the kind().

...

The correct way to write the tag_invoke overload today is

namespace boost { namespace json {

template<class Hash, class Flavor, class V> std::enable_if_t< std::is_same<V, boost::json::value>::value > tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, V const& v ) {

boost::hash2::hash_append( h, f, v.kind() );

...

boost::json::visit( [&](auto const& w){ boost::hash2::hash_append( h, f, w ); }, v ); }

} // namespace json } // namespace boost

Vinnie Falco

4:28 p.m.

On Sat, Dec 7, 2024 at 8:16 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

The other subtlety here is that boost::json::object should be considered an unordered range, rather than an ordinary range.

Why?

...

It doesn't expose a `hasher` typedef, which is the heuristic is_unordered_range uses, but it already specializes is_unordered_range here:

Should we add a hasher typedef? Thanks

Vinnie Falco

4:33 p.m.

On Sat, Dec 7, 2024 at 8:28 AM Vinnie Falco <vinnie.falco@gmail.com> wrote:

...

On Sat, Dec 7, 2024 at 8:16 AM Peter Dimov via Boost < boost@lists.boost.org> wrote:

...
The other subtlety here is that boost::json::object should be considered an unordered range, rather than an ordinary range.

Why?

Nevermind...it should be unordered because equality is independent of the ordering of elements (and json::object preserves order) Thanks

Vinnie Falco

4:17 p.m.

On Sat, Dec 7, 2024 at 7:27 AM Klemens Morgenstern via Boost < boost@lists.boost.org> wrote:

...

I wonder about the usefulness of pointers.

A use-case for pointers is: std::unordered_map< T*, std::weak_ptr<T> > Where the T is constructed via std::make_shared. Note: If any std::weak_ptr references the control block created by std::make_shared after the lifetime of all shared owners ended, the memory occupied by T persists until all weak owners get destroyed as well, which may be undesirable if sizeof(T) is large. Thanks

Ivan Matek

4:27 p.m.

On Sat, Dec 7, 2024 at 5:18 PM Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...

On Sat, Dec 7, 2024 at 7:27 AM Klemens Morgenstern via Boost < boost@lists.boost.org> wrote:

...
I wonder about the usefulness of pointers.

A use-case for pointers is:

std::unordered_map< T*, std::weak_ptr<T> >

Where the T is constructed via std::make_shared. Note:

I think this case itself can easily be handled by people casting T* to std::uintptr_t. I think bigger issue is that users can not do this when pointer is nested inside a type. E.g. How can user hash std::tuple<std::string, int*, int> ? reinterpret cast to std::tuple<std::string, std::uintptr_t, int> Yikes + UB.

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ion Gaztañaga

8 Dec 8 Dec

1:03 a.m.

Hi, This is my review of this library ----------------------------------- - What is your evaluation of the design? ----------------------------------- I think the design is good, I like the clear separation between the algorithm and user-defined types, and I was sympathetic with the original "Types don’t know #" proposal. I think the included hash algorithms are varied and practical. Some minor comments (opinions, don't take them as "bad design decisions"): - "HashAlgorithm::block_size" is an int, IMHO it should std::size_t. A negative value has no sense (and the documentation does not say anything if an algorithm defines it as a negative number, I suppose it's UB/bug). Maybe we should document some valid range that the library supports. - No move constructor/assigment for algorithms. I don't know if those make any sense, but I imagine user implementations with those members are perfectly valid. - The semantics for copy constructor/assignment might not be clear. The library internally uses the copy constructor (e.g. when implementing the unordered range hash), so it certainly expects something more concrete than "equivalent to the original". I imagine a copy/assigned algorithm is required to lead to the same output when updated with the same data. Maybe this should be explained in the documentation with an example. - "result" sounds like a getter/const function returning the already calculated value, but it's a required "final" step that might even return different values if repeatedly called. Naming it to something like "produce_result" might be more self-explaining. - Should the HashAlgorithm support calls to "update" and "result" several times, interleaving both calls? Does the library call "result" only as the final step once or more times? I think required guaranteed should be more explicitly stated in the design/documentation. - I don't know if SHA-1/RIPEMD-128 should be in the library at all (insecure and not sure if useful in any real context). - The original proposal mentions a "variadic version of hash_append", (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#variadic) maybe that would be a useful utility in the library. - The original proposal uses a different syntax: void operator()(void const* key, std::size_t len) noexcept { state_.Update(key, len); } explicit operator result_type() noexcept { std::uint64_t h1, h2; state_.Final(&h1, &h2); return h1; } Which are the advantages of the new approach (update/result) vs the original proposal? - The hash_append_unordered_range function is surprising for people like me that are not experts on hashing, but it's an important concept for users. The original proposal explains the problem (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#unordere...), I feel the problem should be also explained in the documentation of Hash2, and how the library design solves it (using a copy of the HashAlgorithm, as proposed in n3980). N3980 also says that "And there are various other schemes. They are all implementable. But they each have their advantages and disadvantages. Therefore this proposal proposes none of them. Should the future expose the ideal hash_append specification for unordered containers, it can always be added at that time." Why does Hash2 offer a single alternative? Would it be a good a idea to offer different unordered hashing functions (each with different trade-offs) so that a user can choose the best one? -N3980 proposes a "type-erase hasher" (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#type_era...), would it be interesting to include it in Hash2? - I didn't know about the "tag_invoke" customization point trend. I imagine that this avoids polluting all namespaces with the chosen function name. However, AFAIK library proposals as "std::execution" have abandoned the tag_invoke approach and will use member functions as customization points (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2855r1.html). Are member functions a better customization point for user-defined classes than these tag_invoke friend functions that kick ADL (and its associated potential problems)?. Say: //declare a member function in your type that will be //called by boost::hash2::hash_append() template<class Hash, class Flavor> void hash_append (boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f) const { boost::hash2::hash_append(h, f, this->a); boost::hash2::hash_append(h, f, this->b); } Is tag_invoke superior because we can declare customization points for classes that we can't modify? But we need access to data members so we need priviledge access to that type... - The documentation shows (in the chapter "Use with Unordered Containers") several "hash" classes that are needed to be able to use Hash2 with std/boost unordered_xx classes (or boost::intrusive::unordered_xxx types). I think it would be a good idea to include such utilities in this Hash2 library (under names like "hasher", "seeded_hasher". - The design says "The size_type member type of the flavor affects how container and range sizes (typically of type size_t) are serialized. Since the size of size_t in bytes can vary, serializing the type directly results in different hash values when the code is compiled for 64 bit or for 32 bit. Using a fixed width type avoids this". I don't agree with this, between 32 and 64 bit compilation you will have differences when hashing user-defined types that have pointers, stored std::size_t/ptrdiff_t members, etc.. IMHO xxx_flavours defined in the library should define size_type as "std::size_t". Those wanting portable hashes between 32 and 64 bit need to do a bigger effort than configuring Hash2, and probably defining size_type as std::uint64_t will reduce the performance of 32 bit users... ----------------------------------- - What is your evaluation of the implementation? ----------------------------------- Very good IMHO. - I like it supports C++11. - Code is well-written and quite simple. - Extensive test suite. Some comments: - One thing I don't like is that the library depends on container_hash/describe and mp11. I would expect this library to be dependency-free (other than Config/Assert). Specially I dislike the "Describe" dependency. Is there a way to support Described classes without depending on that library? (just like container-like classes are supported without requiring an additional dependency). Dependency on container_hash seems is probably in order to avoid duplicating code. Note tha container_hash also depends on Describe (which I also find unnecessary) - There is an asymmetry between "writeNNle" and "writeNNbe" implementations (little vs big endian). In little endian implementations "detail::is_constant_evaluated() and endianess" is used to be able to use memcpy in some cases (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail/writ...) where in the big endian implementation this is not performed (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail/writ...). Probably few people use big endian platforms (except those like me that still need to compile for powerpc targets...), but I would expect an equivalent implementation for big-endian functions. ----------------------------------- - What is your evaluation of the documentation? ----------------------------------- Good quality documentation. I'm not a fan of single page documentation, though. Some examples are not compilable in C++11: - class X example is missing #include <string>. - class Str example: N is missing the type, missing <algorithm> include, operator== has the wrong type... - All "Use with Unordered Containers" examples should be compilable, including the needed headers. - I think some benchmarks results are missing to show the quality of the implementation (there is a benchmark folder in the library, but no results are shown). I don't expect Hash2 implementations to be the fasted on the planet, but certainly decent hash speeds are important for users. Maybe we should have numbers comparing implemented hashes with std and boost container_hash implementations? Against some other reference implementations? The original N3980 proposal (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#testing) has some numbers also on the quality of the hash functions. ----------------------------------- - Did you try to use the library? With which compiler(s)? Did you have any problems? ----------------------------------- Yes. I've compiled and run under the debugger the documentation examples. I've run the test-suite run successfully: Windows: toolset=msvc-14.3,msvc-14.2,msvc-14.1 variant=debug,release address-model=32,64 Linux (Ubuntu 24): toolset=gcc-10,gcc-11,gcc-12,gcc-13,gcc-14, clang-14,clang-15,clang-16,clang-17,clang-18 variant=debug,release No problems were detected. ----------------------------------- - How much effort did you put into your evaluation? ----------------------------------- A couple of hours, reading the documentation, reading the source code, passing the boostdep tool to the library, and debugging examples. ----------------------------------- - Are you knowledgeable about the problem domain? ----------------------------------- Not an expert on hash algorithms, I've reviewed the library from the point of view of a user that wants to be able to use different hash algorithms without modifying the implementation of the user-defined types. I've implemented hash-based containers (but using the separate chaining strategy) ----------------------------------- Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions). ----------------------------------- I recommend to ACCEPT the library, I don't think my suggestions/things I dislike should not condition the acceptance. Best, Ion

Matt Borland

1:15 a.m.

...

... I recommend to ACCEPT the library ...

Ion

Ion, thank you for your review and comments. Matt

Peter Dimov

11:42 a.m.

Ion Gaztañaga wrote:

...

Hi,

This is my review of this library

Thank you for the review. That's a lot of questions. :-)

...

- "HashAlgorithm::block_size" is an int, IMHO it should std::size_t.

Maybe. The type doesn't matter that much for integral constants, but I suppose size_t is more correct.

...

- No move constructor/assigment for algorithms. I don't know if those make any sense, but I imagine user implementations with those members are perfectly valid.

Hash algorithms should be allowed to provide a move constructor or a move assignment operator, yes.

...

- The semantics for copy constructor/assignment might not be clear. The library internally uses the copy constructor (e.g. when implementing the unordered range hash), so it certainly expects something more concrete than "equivalent to the original".

"Equivalent to the original" is the strongest possible requirement. I don't see how the library can possibly expect anything more than that. In particular,

...

I imagine a copy/assigned algorithm is required to lead to the same output when updated with the same data.

... this is implied by "equivalent to the original", because if the copy produced a different output, it wouldn't be equivalent.

...

- Should the HashAlgorithm support calls to "update" and "result" several times, interleaving both calls?

It should, yes. Any calls are allowed, in any order.

...

- I don't know if SHA-1/RIPEMD-128 should be in the library at all (insecure and not sure if useful in any real context).

SHA-1 is still widely used, e.g. for Git commit hashes. We added RIPEMD-128 in order to have a recommended replacement for MD5 in the case where the digest has to be exactly 128 bits.

...

- The original proposal mentions a "variadic version of hash_append", (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#variadic) maybe that would be a useful utility in the library.

It might be, yes.

...

- The original proposal uses a different syntax:

void operator()(void const* key, std::size_t len) noexcept { state_.Update(key, len); }

explicit operator result_type() noexcept { std::uint64_t h1, h2; state_.Final(&h1, &h2); return h1; }

Which are the advantages of the new approach (update/result) vs the original proposal?

It seems obvious to me that named functions are better than unnamed functions.

...

- The hash_append_unordered_range function is surprising for people like me that are not experts on hashing, but it's an important concept for users. The original proposal explains the problem (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#unordered), I feel the problem should be also explained in the documentation of Hash2, and how the library design solves it (using a copy of the HashAlgorithm, as proposed in n3980).

N3980 also says that "And there are various other schemes. They are all implementable. But they each have their advantages and disadvantages. Therefore this proposal proposes none of them. Should the future expose the ideal hash_append specification for unordered containers, it can always be added at that time."

Why does Hash2 offer a single alternative?

I don't know of a better one. Constructing a temporary std::set is not a serious alternative. In addition to being allocating, slow, and potentially throwing, it also relies on the existence of operator<, which an unordered container doesn't require.

...

-N3980 proposes a "type-erase hasher" (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#type_erased_hasher ), would it be interesting to include it in Hash2?

I had the type erased hasher in the "planned additions" list for a long time, partly because I thought I needed it for the hash2sum example. But it doesn't meet the hash algorithm requirements, because it doesn't have the correct constructors and because the copy constructor is allocating and potentially throwing (the latter is not, strictly speaking, a violation but it's still undesirable.)

...

Is tag_invoke superior because we can declare customization points for classes that we can't modify?

Yes.

...

But we need access to data members so we need priviledge access to that type...

Not always. E.g. std::complex or std::error_code provide accessors for the necessary members.

...

- The documentation shows (in the chapter "Use with Unordered Containers") several "hash" classes that are needed to be able to use Hash2 with std/boost unordered_xx classes (or boost::intrusive::unordered_xxx types). I think it would be a good idea to include such utilities in this Hash2 library (under names like "hasher", "seeded_hasher".

One of these, or maybe some different one, will eventually be included, once I decide on which.

...

- The design says "The size_type member type of the flavor affects how container and range sizes (typically of type size_t) are serialized. Since the size of size_t in bytes can vary, serializing the type directly results in different hash values when the code is compiled for 64 bit or for 32 bit. Using a fixed width type avoids this".

I don't agree with this, between 32 and 64 bit compilation you will have differences when hashing user-defined types that have pointers, stored std::size_t/ptrdiff_t members, etc..

There are still a lot of types for which the size type is the only source of variability, e.g. std::vector<int>.

...

IMHO xxx_flavours defined in the library should define size_type as "std::size_t". Those wanting portable hashes between 32 and 64 bit need to do a bigger effort than configuring Hash2, and probably defining size_type as std::uint64_t will reduce the performance of 32 bit users...

It doesn't. I initially had size_type defined as uint32_t for similar reasons, but it doesn't really matter, so I changed it to uint64_t.

...

- One thing I don't like is that the library depends on container_hash/describe and mp11. I would expect this library to be dependency-free (other than Config/Assert). Specially I dislike the "Describe" dependency. Is there a way to support Described classes without depending on that library?

No.

...

- There is an asymmetry between "writeNNle" and "writeNNbe" implementations (little vs big endian). In little endian implementations "detail::is_constant_evaluated() and endianess" is used to be able to use memcpy in some cases (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail /write.hpp#L26) where in the big endian implementation this is not performed (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail /write.hpp#L73).

Probably few people use big endian platforms (except those like me that still need to compile for powerpc targets...), but I would expect an equivalent implementation for big-endian functions.

The implementations are equivalent in practice. GCC and Clang recognize the pattern used and turn it into a single mov; the memcpy path is only necessary for MSVC, which doesn't support big endian architectures.

...

- class Str example: N is missing the type, missing <algorithm> include, operator== has the wrong type...

Sorry, why does operator== have the wrong type?

...

- All "Use with Unordered Containers" examples should be compilable, including the needed headers.

The examples are actually compilable, I just stripped the #includes when embedding them in the documentation to avoid the unnecessary repetition. Maybe I shouldn't have. :-)

Ion Gaztañaga

7:39 p.m.

El 08/12/2024 a las 12:42, Peter Dimov via Boost escribió:

...

... this is implied by "equivalent to the original", because if the copy produced a different output, it wouldn't be equivalent.

...
- Should the HashAlgorithm support calls to "update" and "result" several times, interleaving both calls?

It should, yes. Any calls are allowed, in any order.

Understood.

...

...
- I don't know if SHA-1/RIPEMD-128 should be in the library at all (insecure and not sure if useful in any real context).

SHA-1 is still widely used, e.g. for Git commit hashes. We added RIPEMD-128 in order to have a recommended replacement for MD5 in the case where the digest has to be exactly 128 bits.

Ok.

...

...
- The original proposal mentions a "variadic version of hash_append", (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#variadic) maybe that would be a useful utility in the library.

It might be, yes.

Let's add it to the to-do list ;-)

...

...
- The hash_append_unordered_range function is surprising for people like me that are not experts on hashing, but it's an important concept for users. The original proposal explains the problem (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#unordered), I feel the problem should be also explained in the documentation of Hash2, and how the library design solves it (using a copy of the HashAlgorithm, as proposed in n3980).

N3980 also says that "And there are various other schemes. They are all implementable. But they each have their advantages and disadvantages. Therefore this proposal proposes none of them. Should the future expose the ideal hash_append specification for unordered containers, it can always be added at that time."

Why does Hash2 offer a single alternative?

I don't know of a better one.

Constructing a temporary std::set is not a serious alternative. In addition to being allocating, slow, and potentially throwing, it also relies on the existence of operator<, which an unordered container doesn't require.

Ok. Yes, the other alternative described in the original paper is much worse. Not sure how the hash quality is weakened with the curren implementation: all element hashes are converted to 64 bit, added and a final hash is computed. It seems to me that combining and shuffling all original result_type bits from each individual hash would provide a better hash result for the whole sequence, but I'm not certainly able to mathematically prove that...

...

...
-N3980 proposes a "type-erase hasher" (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#type_erased_hasher ), would it be interesting to include it in Hash2?

I had the type erased hasher in the "planned additions" list for a long time, partly because I thought I needed it for the hash2sum example. But it doesn't meet the hash algorithm requirements, because it doesn't have the correct constructors and because the copy constructor is allocating and potentially throwing (the latter is not, strictly speaking, a violation but it's still undesirable.)

Type-erasing requires trade-offs but for types like pimpl idioms, you can't template or it's not desirable to template code that previously was on a cpp. One option is that boost::hash2::type_erased_hasher can have enough small buffer optimization (at least for hashes implemented in the boost::hash2 library) so that allocation can be avoided in most cases. A good type_erased_hasher is not trivial and certainly useful.

...

...
- The documentation shows (in the chapter "Use with Unordered Containers") several "hash" classes that are needed to be able to use Hash2 with std/boost unordered_xx classes (or boost::intrusive::unordered_xxx types). I think it would be a good idea to include such utilities in this Hash2 library (under names like "hasher", "seeded_hasher".

One of these, or maybe some different one, will eventually be included, once I decide on which.

Great.

...

...
- The design says "The size_type member type of the flavor affects how container and range sizes (typically of type size_t) are serialized. Since the size of size_t in bytes can vary, serializing the type directly results in different hash values when the code is compiled for 64 bit or for 32 bit. Using a fixed width type avoids this".

I don't agree with this, between 32 and 64 bit compilation you will have differences when hashing user-defined types that have pointers, stored std::size_t/ptrdiff_t members, etc..

There are still a lot of types for which the size type is the only source of variability, e.g. std::vector<int>.

...
IMHO xxx_flavours defined in the library should define size_type as "std::size_t". Those wanting portable hashes between 32 and 64 bit need to do a bigger effort than configuring Hash2, and probably defining size_type as std::uint64_t will reduce the performance of 32 bit users...

It doesn't.

Show me the numbers ;-) Maybe if benchmark results are added to the documentation this can be shown.

...

...
- There is an asymmetry between "writeNNle" and "writeNNbe" implementations (little vs big endian). In little endian implementations "detail::is_constant_evaluated() and endianess" is used to be able to use memcpy in some cases (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail /write.hpp#L26) where in the big endian implementation this is not performed (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail /write.hpp#L73).

Probably few people use big endian platforms (except those like me that still need to compile for powerpc targets...), but I would expect an equivalent implementation for big-endian functions.

The implementations are equivalent in practice. GCC and Clang recognize the pattern used and turn it into a single mov; the memcpy path is only necessary for MSVC, which doesn't support big endian architectures.

Ok, understood.

...

...
- class Str example: N is missing the type, missing <algorithm> include, operator== has the wrong type...

Sorry, why does operator== have the wrong type?

Because operator== for Str compares X instances instead of Str instances: class Str { private: static constexpr N = 32; std::uint8_t size_ = 0; char data_[ N ] = {}; public: friend constexpr bool operator==( X const& x1, X const& x2 ) { return x1.size_ == x2.size_ && std::equal( x1.data_, x1.data_ + x1.size_, x2.data_ ); }

...

...
- All "Use with Unordered Containers" examples should be compilable, including the needed headers.

The examples are actually compilable, I just stripped the #includes when embedding them in the documentation to avoid the unnecessary repetition. Maybe I shouldn't have. :-)

I've found (at least in my libraries) that some users like to copy-paste examples in the doc and want them to work without needing to figure out what's missing. Best, Ion

Peter Dimov

9 Dec 9 Dec

1:54 p.m.

Ion Gaztañaga wrote:

...

- I don't know if SHA-1/RIPEMD-128 should be in the library at all (insecure and not sure if useful in any real context).

Amusingly enough, I'm _right now_ looking at an API documentation which requires HMAC-SHA1. :-) (Proprietary and NDAed, of course.)

Darryl Green

10:55 p.m.

On Mon, 9 Dec 2024, 11:54 pm Peter Dimov via Boost, <boost@lists.boost.org> wrote:

...

Amusingly enough, I'm _right now_ looking at an API documentation which requires HMAC-SHA1. :-)

Amusingly enough there isn't a problem with using HMAC-SHA1 if the key is being used as a secret, is predictable etc. This should make its use in message authentication perfectly ok if the conditions for effective authentication through use of HMAC are actually met in the system in question.

Richard Hodges

8 Dec 8 Dec

12:05 p.m.

*My review:* - What is your evaluation of the design? Excellent. The trade-offs in API design have been well explained. There is nothing in the design that will get in the way of any use case of mine. - What is your evaluation of the implementation? My assumption is that it will be carefully thought out and close to perfectly implemented. The author has form. - What is your evaluation of the documentation? Comprehensive and well laid out with plenty of examples. One reading of the documentation left me convinced that I am ready to incorporate the library into new and existing code, and know how to. - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? Very useful. When I write software it is generally facing the internet and is generally interfacing with servers that require HMAC. The library fulfils my three main requirements: - prevention of DOS hash attacks - computation of HMAC - A trustworthy and *easy to use* SHA-256 implementation. out of the box - Did you try to use the library? With which compiler(s)? Did you have any problems? No - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? An afternoon of reading the documentation and evaluating the library's relevance to me. - Are you knowledgeable about the problem domain? Yes My vote: ACCEPT My comment: If the ISO committee rejected a library as well designed and as useful as this, this is merely a continuation of the stream of evidence of its incompetence. R On Sat, 7 Dec 2024 at 14:47, Matt Borland via Boost <boost@lists.boost.org> wrote:

...

Dear all,

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

Please provide feedback on the following general topics:

- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

Matt Borland Review Manager and C++Alliance Staff Engineer

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Matt Borland

3:16 p.m.

...

My vote: ACCEPT

...

R

Thank you for your review Richard. Matt

Zach Laine

9 Dec 9 Dec

8:38 p.m.

On Sat, Dec 7, 2024 at 7:47 AM Matt Borland via Boost <boost@lists.boost.org> wrote:

...

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024. Please provide feedback on the following general topics:

- What is your evaluation of the design?

Overall, I like it. First, some general notes. I know that the authors want robust C++11 support, but I think the library suffers from having a C++11-only implementation. I think if you want to support C++11, then by all means do that. But the rest of use would like interfaces that use: std::byte instead of sequences of unsigned char and void *; std::span instead of a pointer and size; language concepts instead of documentation-as-concepts, etc. If you macro guard your C++11 interfaces, you can have your cake and eat it too. I for one have not used C++11 since sometime in 2014. While 11 users exist, they are a small minority now. Requiring them to define a macro to get 11-compatible interfaces does not seem onerous to me, and the rest of us should be able to use more modern stuff. One example is digest<>. The docs say, "digest<N> is a constexpr-friendly class template similar to std::array<unsigned char, N>. It is used to store the resulting message digest of hash algorithms such as SHA2-256 or RIPEMD-160.". So, exactly like C++17 std::array then? :) I'd prefer that digest<> be an alias for std::array<> in later C++ build modes. In short, we don't need more backward-compat types that replace vocabulary types -- the backward-compat type should only exist as a polyfill, and the normal vocabulary type should be used wherever possible. Now, more specific notes: Why the discrepancy between unsigned char const * in the hash algorithm ctor, and void const * in update()? Why define your own endian? You already have an external dependency it seems, on boost::container_hash, so why not use the logic and constants from Boost.Endian? I'm sympathetic to the argument raised on the list earlier that, since Flavor is known statically and should remain fixed across calls to hash_append(), it would be nice if it could be captured somehow, so I don't have to repeat it over and over, possibly fat-fingering it and introducing a bug. Once possibility would be to make a struct hash_appender (or whatever name), and give it a member function: template<class Flavor> struct hash_appender { template<class Hash, class T> constexpr void operator()(Hash& h, T const& v) { hash2::hash_append(h, Flavor{}, v); } }; Then the user could have the option of writing: auto hash_append = boost::hash2::hash_appender<Flav>{}; hash_append(h, x); hash_append(h, y); hash_append(h, z); To me, that's much nicer as a user, and having both interfaces would be a benefit. If I needed to call hash_append() over and over, I'd use the binary API; for a single call, I'd probably just use the ternary one. You might need an appender for each has_append operation, like hash_append_unordered_range() or whatever. That brings me to my next point. Why are there so many overloads of has_append*()? Here they all are: template<class Hash, class Flavor, class T> constexpr void hash_append( Hash& h, Flavor const& f, T const& v ); template<class Hash, class Flavor, class It> constexpr void hash_append_range( Hash& h, Flavor const& f, It first, It last ); template<class Hash, class Flavor, class T> constexpr void hash_append_size( Hash& h, Flavor const& f, T const& v ); template<class Hash, class Flavor, class It> constexpr void hash_append_sized_range( Hash& h, Flavor const& f, It first, It last ); template<class Hash, class Flavor, class It> constexpr void hash_append_unordered_range( Hash& h, Flavor const& f, It first, It last ); Naively, I would have reduced them to: struct unordered_tag {}; template<class Hash, class Flavor, class T> constexpr void hash_append( Hash& h, Flavor const& f, T const& v ); template<class Hash, class Flavor, class It> constexpr void hash_append( Hash& h, Flavor const& f, It first, It last ); template<class Hash, class Flavor, class It> constexpr void hash_append( Hash& h, Flavor const& f, It first, It last, unordered_tag ); Is there a reason that could not work? In particular, It's not clear to me why a value used as a size must be cast to Flavor::size_type. What happens when I call hash_append() (that is, I forget to say hash_append_size())? As a data point, detail::do_hash_append() uses: hash2::hash_append( h, f, '\x00' ); ... for empty fixed-size types like std::arrays, but the overload for variable-sized containers like std::vector uses: hash2::hash_append_sized_range( h, f, v.begin(), v.end() ); ... which in turn calls: hash2::hash_append_size( h, f, last - first ); Also, even if you don't change any of the names above, please don't use the phrase "sized_range", since we now have a std::ranges concept with that name that means something substantially different (std::ranges::sized_range requires std::ranges::size(x), for instance, where yours does not). Speaking of which, it would be nice to have direct support for std::range::sized_ranges and std::contiguous_iterators in C++20 mode. Right now, you only detect a small, fixed subset of all possible sized ranges, and treat T* as the only possible contiguous iterator. These could be supported within the existing overloads, with "if constexpr (std::ranges::sized_range<T>)", etc.

...

- What is your evaluation of the implementation?

Looks good -- easy to read; no red flags. There are lots of low-level shenanigans, and they are done without UB, as far as I can tell.

...

- What is your evaluation of the documentation?

I think the intro could really benefit from a small snippet showing a simple use of the library. In particular, I cannot tell at the beginning whether I always need to write something like HashAlgorithm or not. I'm sure it will become clear later, but earlier is better; providing simple use like hashing a string using a reasonable choice of algorithm, for use in a std::unordered_map<std::string, std::string> would be great. It would give me the context for the rest of the docs. Speaking of which, the docs go straight from a minimal intro into how to write my own hash algorithm. I think most users will want to use one of the provided ones, and has_append(). I think that should come first, or maybe you could just have a link to a quick-start/cheat sheet-type section that the user could jump to right away. In the section Contiguously Hashable Types, I think you should define "contiguously hashable" when introducing it, or at least link to a definition. I think I can guess what it means from context, but I'm not certain. As it is now, being certain would mean going to the implementation of is_contiguously_hashable. In the section Ranges, please provide more details on how you determine the order in which to add elements from an unordered range. What you have now is (I think intentionally) vague, but I'm concerned about performance -- I expect hashing operations to be pretty damn fast, and the currently-thin description makes me want to go start reading code. Looking at the code, it's a linear time algorithm, and that's the main thing I was worried about -- maybe just mention that. In the section about adding tag_invoke overloads, s/the member c is not part of the object state/the member c is not a salient part of the object state, to match the comment in the code just above. For the reasons I mentioned before about typical users, I think the example about how to use the library with unordered containers should be the first example. The code for that example is very nice, and I think if you just add a comment like this to the line that calls hash_append(): std::size_t operator()( T const& v ) const { H h; boost::hash2::hash_append( h, {}, v ); // hash values of any type here, including user-defined types return boost::hash2::get_integral_result<std::size_t>( h.result() ); } ... that would be the 90% use case code example the intro is missing.

...

- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?

I think it will be very useful. I look forward to not having to write hash_append over and over again in different projects. :) I do not use the library currently.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems?

I did not try to use it.

...

- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I read the documentation, and read the implementation (though not the hash algorithms themselves). I spent about 4 hours doing this review.

...

- Are you knowledgeable about the problem domain?

As a user of hashing containers, yes. As for anything cryptographic, not at all.

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

I vote to ACCEPT this library. Zach Zach

Matt Borland

8:45 p.m.

...

I vote to ACCEPT this library.

...

Zach

Zach, thank you for your review and comments. Matt

Peter Turcan

11 p.m.

Matt, et al, I have read over the documentation for the Hash2 library, and what follows are my notes. First and foremost, *I appreciate the effort and detail that has gone into getting the documentation this far - no mean amount of work*. Also appreciate that there are plenty of links to external topics and many examples - all helpful, and perhaps some of the things I mention here are already planned. *General Points:* 1. Just a little confused as to the name "Hash2" when there is no "Hash1" ? Perhaps this was explained earlier and I missed it. 2. With all the doc in one single page I found the absence of an obvious Search function awkward, but perhaps this is forthcoming through another effort? I found I wanted to search for words such as "required", "optional", "hash_append" and so on. 3. I agree with other commenters that the all-in-one-page approach is daunting ("Grandmother's fruitcake"). Would have liked to see at least all the top level topics as separate pages, though ideally Reference material is better as one page per construct (class, method, property, structure) so a dev can link to that page when they are working hard on an implementation of the construct. *Contents:* 1. The "Supported Compilers" section should be moved up after the Overview, as part of a section "Getting Started with Hash2". 2. The Getting Started section should work through one example that a newcomer to hashing can understand: a. installing Hash2 b. linking/compiler requirements or options c. provides a short but meaningful example d. running and explaining the output 3. The Getting Started section could use a "Common Errors and Issues" section detailing the most likely errors, misconceptions or stumbling blocks and how to respond. And/Or a Troubleshooting section for the library as a whole. 4. Consider following this with a "Tutorial" section - repeating the steps of the Getting Started section, but with a meatier and potentially useful example. 5. Usage Examples, the code examples have almost no comments - would benefit from being extensively commented. - same for other code snippets throughout the doc - though "Examples" takes priority 6. The "Example" sections in the Reference would also benefit from a sentence/para explaining what the example does, in particular noting any gotchas or potential pain points. 7. In the doc: Described classes (using Boost.Describe). Boost.Describe should be a link to that libs docs. Same for mentions of other libraries. 8. In the block_size section for Hash Algorithm Requirements, there are the two paragraphs (my bold): Cryptographic hash functions provide a block_size value, which is their block size (e.g. 64 for MD5, 128 for SHA2-512) and is *required *in order to implement the corresponding HMAC. block_size is an *optional *requirement. These statements appear to be contradictory - and there is no such thing as an "optional requirement". There is such a thing as an "optional parameter/setting". Consider searching the doc for the phrase "optional requirement" and amending. Consider searching for other "required/optional" statements and verify there is no ambiguity. 9. Nit. Replace "e.g." with "for example, " - it is so much easier to read. Same for other contractions (i.e. = "that is" etc. = "and similar/so on"). 10. Should there be an "Acknowledgements" section near the end - design input, testers? I am the technical writer for the Cpp Alliance. Hope this helps. Best of luck with the project. - Peter Turcan On Mon, Dec 9, 2024 at 12:45 PM Matt Borland via Boost < boost@lists.boost.org> wrote:

...

...
I vote to ACCEPT this library.

...
Zach

Zach, thank you for your review and comments.

Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

10 Dec 10 Dec

12:47 a.m.

Peter Turcan wrote:

...

1. Just a little confused as to the name "Hash2" when there is no "Hash1" ? Perhaps this was explained earlier and I missed it.

"Hash1" is the library formerly known as Functional/Hash, and now named ContainerHash. It defines the function object boost::hash. (Whereas Hash2 defines its stuff in namespace boost::hash2.)

Ion Gaztañaga

8:48 p.m.

El 10/12/2024 a las 1:47, Peter Dimov via Boost escribió:

...

Peter Turcan wrote:

...
1. Just a little confused as to the name "Hash2" when there is no "Hash1" ? Perhaps this was explained earlier and I missed it.

"Hash1" is the library formerly known as Functional/Hash, and now named ContainerHash. It defines the function object boost::hash.

(Whereas Hash2 defines its stuff in namespace boost::hash2.)

Since Hash2 depends on ContainerHash for several traits and functions both have the same dependencies (Config, Assert, Describe, Mp11), and both are about hashing... Wouldn't make sense to merge both libraries into a single Hash library? Best, Ion

Peter Dimov

8:54 p.m.

Ion Gaztañaga wrote:

...

El 10/12/2024 a las 1:47, Peter Dimov via Boost escribió:

...
Peter Turcan wrote:

...
1. Just a little confused as to the name "Hash2" when there is no "Hash1" ? Perhaps this was explained earlier and I missed it.

"Hash1" is the library formerly known as Functional/Hash, and now named ContainerHash. It defines the function object boost::hash.

(Whereas Hash2 defines its stuff in namespace boost::hash2.)

Since Hash2 depends on ContainerHash for several traits and functions both have the same dependencies (Config, Assert, Describe, Mp11), and both are about hashing... Wouldn't make sense to merge both libraries into a single Hash library?

Maybe, eventually. ContainerHash doesn't accept a seed at the moment, and there isn't that much to be gained by the merge. We also have existing hash_value overloads in user code, and these have to continue working.

Matt Borland

1:55 p.m.

...

I have read over the documentation for the Hash2 library, and what follows are my notes.

Peter, thank you for the documentation review. I always find these beneficial. Matt

Christopher Kormanyos

8:06 p.m.

...

The review of Hash2 by Peter Dimov and Christian Mazakas begins

Christopher's Review. DISCLAIMER: I know the review manager. I highly respect Matt as a person and through his open-source work. Furthermore, I work in a dedicated fashion with Matt on new, potentially upcoming Boost-like projects. I do not receive any open-source money. I operate as a self-funded, independent researcher in FOSS. Here is my review of the proposed Boost.Hash2 - What is your evaluation of the design? Exceptional. It is clean with terse, well-deocumented code. The design addresses a sorely neglected area in C/C++ programming and makes strong, decisive strides to move forward. - What is your evaluation of the implementation? It's simply fantastic. And I will qualify this further below. - What is your evaluation of the documentation? It is fine and I could use the library immediately within about 3 minutes. - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? The library is highly useful. At the moment, it lacks some of the extreme quality aspects that my industrial area would mandate. But it is not far off from becoming a viable secure component. I'd be willing to work toward improving it with the authors. What I'd like to see short-term: * Handle enhanced compiler warnings. * Include a subset of NIST testing. * Fuzzing run(s) on some hashes in CI. * I think SHA-3 is worthy of inclusion. - Did you try to use the library? With which compiler(s)? Did you have any problems? Yes. I performed a whole slew of PC-based tests with the standard vanilla compiler crew including GCC, VC, clang in a few versions. Given the (already) extremely well-tested functionality of Hash2, there were no problems. But here comes the fun part. When I studied the implementation it seemed highly portable. So I put it onto several embedded bare-metal systems. The code compiled and ran on 32-bit microcontrollers and even ran on an 8-bit controller. Rarely do we find such portability and quality in Boost-proposed libraries, or anywhere for that matter. Kudos on a job well done! I can provide ROM-sizes and microsecond run-time comparisons if needed. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? An in-depth study. - Are you knowledgeable about the problem domain? Over the decades, my "day-job" has involved delivery of hundreds of millions of high-volume, safety-critical devices, mostly having some kind of crypto in them. I think I'm somewhat knowledgeable in this area. I've had determining roles including library author, library client, security auditor and also specified the commercial acquisition of numerous libraries and silicon vendor crypto-accelerators. Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions). I vote to unconditionaly ACCEPT Hash2. Thank you to the authors forthis seminal work and bold step forward. Thanks Matt for managing this rich and lively review. Kind regards, Christopher Kormanyos On Tuesday, December 10, 2024 at 02:56:10 PM GMT+1, Matt Borland via Boost <boost@lists.boost.org> wrote:

...

I have read over the documentation for the Hash2 library, and what follows are my notes.

Peter, thank you for the documentation review. I always find these beneficial. Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Matt Borland

8:19 p.m.

...

Christopher's Review... I vote to unconditionaly ACCEPT Hash2.

Chris, thank you for the review and portability testing. Matt

Peter Dimov

8:27 p.m.

Christopher Kormanyos wrote:

...

But here comes the fun part. When I studied the implementation it seemed highly portable. So I put it onto several embedded bare-metal systems. The code compiled and ran on 32-bit microcontrollers and even ran on an 8-bit controller. Rarely do we find such portability and quality in Boost-proposed libraries, or anywhere for that matter. Kudos on a job well done!

Thanks Christopher. :-)

...

What I'd like to see short-term: * Handle enhanced compiler warnings.

Would you please go into more detail here? We currently test with -Wall -Wextra.

Christopher Kormanyos

8:37 p.m.

...

...
What I'd like to see short-term: * Handle enhanced compiler warnings.

...

Would you please go into more detail here? We currently test with -Wall -Wextra.

Yes. I would like to. I appreciate and benefit from (at bare minimum in GCC): -Wall -Wextra -Wpedantic -Wconversion -Wsign-conversion And sometimes I even go for -Werror You would face a much larger onslaught of warnings with a commercial-grade CERT-C++ checker, so that set above is actually rather slack in my world. I can provide qan actual build-log with line numbers if needed. Or I can just PR the changes if you like. The whole warning-game is just like, hey, why the noise? 90% of the warnings are useless, but every once in a while you get a nugget that (might) prevent you a really big crash. So I just activate a bunch of warnings, handle them all and simultaneously weed out the good stuff. Here is my typical warning set "on the metal" real-time-cpp/ref_app/target/app/make/app_make.gmk at 8cc0ce3dd26c30eeba21a1826716f7604f5ef2e4 · ckormanyos/real-time-cpp | | | | | | | | | | | real-time-cpp/ref_app/target/app/make/app_make.gmk at 8cc0ce3dd26c30eeba... Source code for the book Real-Time C++, by Christopher Kormanyos - ckormanyos/real-time-cpp | | | Kind regards, Christopher On Tuesday, December 10, 2024 at 09:28:10 PM GMT+1, Peter Dimov via Boost <boost@lists.boost.org> wrote: Christopher Kormanyos wrote:

...

But here comes the fun part. When I studied the implementation it seemed highly portable. So I put it onto several embedded bare-metal systems. The code compiled and ran on 32-bit microcontrollers and even ran on an 8-bit controller. Rarely do we find such portability and quality in Boost-proposed libraries, or anywhere for that matter. Kudos on a job well done!

Thanks Christopher. :-)

...

What I'd like to see short-term: * Handle enhanced compiler warnings.

Would you please go into more detail here? We currently test with -Wall -Wextra. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Christian Mazakas

11:25 p.m.

On Tue, Dec 10, 2024 at 12:07 PM Christopher Kormanyos via Boost < boost@lists.boost.org> wrote:

...

What I'd like to see short-term: * Handle enhanced compiler warnings. * Include a subset of NIST testing. * Fuzzing run(s) on some hashes in CI. * I think SHA-3 is worthy of inclusion.

I was wondering what kind of NIST testing you're alluding to. We do have some copy-pasted test vectors for myriad PDFs but for a good portion of the algorithms, we're using the test vectors outlined here: https://github.com/pdimov/hash2/blob/7a25f8518692b657e9272884519519fbaca2ec5... https://csrc.nist.gov/Projects/Cryptographic-Algorithm-Validation-Program/Se... Towards the bottom there under Test Vectors, one can download a .zip folder full of .rsp files which were used to verify the output of the applicable algorithms. I tried to avoid relying on those too much during testing because I wanted something quasi-human readable and understandable so if there was an applicable PDF, I preferred that. I did think about a scenario where we would've committed the .rsp files to the repo and run them during CI or some such as part of a much more extensive test suite. Which hashes would you like to see fuzzed, assuming "all of them" is off the table? And how long should we fuzz as well? I'm not sure if we can exhaustively fuzz the algorithms as part of a normal CI infrastructure. - Christian

Christopher Kormanyos

11 Dec 11 Dec

6:13 a.m.

...

...
* Include a subset of NIST testing.

...

I did think about a scenario where we would've committed the .rsp files to the repo and run them during CI or some such as part of a much more extensive test suite.

Thanks for responding, Christian. I think including and running the short test vectors might be a good compromise for your CI. They run rather quickly and I don't think that'll slow down CI too much.

...

...
* Fuzzing run(s) on some hashes in CI.

...

Which hashes would you like to see fuzzed, assuming "all of them" is off the table? And how long should we fuzz as well?

I think one of them, your pick, like SHA2-512 maybe for 5 minutes in CI would add robustness. Maybe limit the fuzzed messages to 2 < len < 1024 bytes? This is not required in any way, although I think this would increase robustness. Christopher. On Wednesday, December 11, 2024 at 12:26:07 AM GMT+1, Christian Mazakas via Boost <boost@lists.boost.org> wrote: On Tue, Dec 10, 2024 at 12:07 PM Christopher Kormanyos via Boost < boost@lists.boost.org> wrote:

...

What I'd like to see short-term: * Handle enhanced compiler warnings. * Include a subset of NIST testing. * Fuzzing run(s) on some hashes in CI. * I think SHA-3 is worthy of inclusion.

Christopher Kormanyos

6:39 a.m.

...

...
Which hashes would you like to see fuzzed, assuming "all of them" is off the table? And how long should we fuzz as well?

...

I think one of them, your pick, like SHA2-512 maybe for 5 minutes in CI would add robustness. Maybe limit the fuzzed messages to 2 < len < 1024 bytes? This is not required in any way, although I think this would increase robustness.

You could link up with OpenSSL to obtain control values for the dynamically selected, fuzzed message(s). Christopher On Wednesday, December 11, 2024 at 07:13:42 AM GMT+1, Christopher Kormanyos <e_float@yahoo.com> wrote:

...

...
* Include a subset of NIST testing.

...

I did think about a scenario where we would've committed the .rsp files to the repo and run them during CI or some such as part of a much more extensive test suite.

Thanks for responding, Christian. I think including and running the short test vectors might be a good compromise for your CI. They run rather quickly and I don't think that'll slow down CI too much.

...

...
* Fuzzing run(s) on some hashes in CI.

...

Which hashes would you like to see fuzzed, assuming "all of them" is off the table? And how long should we fuzz as well?

...

What I'd like to see short-term: * Handle enhanced compiler warnings. * Include a subset of NIST testing. * Fuzzing run(s) on some hashes in CI. * I think SHA-3 is worthy of inclusion.

Peter Dimov

10 Dec 10 Dec

12:10 a.m.

Zach Laine wrote:

...

One example is digest<>. The docs say, "digest<N> is a constexpr-friendly class template similar to std::array<unsigned char, N>. It is used to store the resulting message digest of hash algorithms such as SHA2-256 or RIPEMD- 160.". So, exactly like C++17 std::array then? :)

Almost. Most of std::array does indeed become constexpr in C++17, except for comparisons. For those you have to wait until C++20. digest<> also has operator<< and to_string.

Peter Dimov

12:26 a.m.

Zach Laine wrote:

...

Why the discrepancy between unsigned char const * in the hash algorithm ctor, and void const * in update()?

The discrepancy is because the both used to take `unsigned char const*`, and then I changed `update` to `void const*` to avoid all the reinterpret casts, but didn't change the constructor. However, the key often comes in char[] or string form, especially when using HMAC to interact with existing APIs, so I think that the constructor is also going to be changed soon to take `void const*` to match.

Peter Dimov

12:30 a.m.

Zach Laine wrote:

...

Why define your own endian? You already have an external dependency it seems, on boost::container_hash, so why not use the logic and constants from Boost.Endian?

Endian would be an avoidable dependency, and people don't like those. ContainerHash is different because the reason to depend on it is to pick up user specializations of its traits. If you specialize ContainerHash's traits, it's exceedingly likely that you want the same specializations for Hash2, and it's friendlier to not make users specialize the same trait twice, as would have happened if I duplicated them.

Peter Dimov

12:34 a.m.

Zach Laine wrote:

...

Naively, I would have reduced them to:

struct unordered_tag {};

template<class Hash, class Flavor, class T> constexpr void hash_append( Hash& h, Flavor const& f, T const& v );

template<class Hash, class Flavor, class It> constexpr void hash_append( Hash& h, Flavor const& f, It first, It last );

template<class Hash, class Flavor, class It> constexpr void hash_append( Hash& h, Flavor const& f, It first, It last, unordered_tag );

Is there a reason that could not work?

There is, yes. Naming everything hash_append doesn't play well with making hash_append variadic, as has already been suggested a few times. Once you have that, the above overload set becomes a nice trap because hash_append( h, f, "foo", "bar" ); calls the iterator overload.

Peter Dimov

12:39 a.m.

Zach Laine wrote:

...

Also, even if you don't change any of the names above, please don't use the phrase "sized_range", since we now have a std::ranges concept with that name that means something substantially different

That thought has occurred to me, but I never came up with a replacement name that I was happy with.

...

(std::ranges::sized_range requires std::ranges::size(x), for instance, where yours does not).

If we add overloads for the _range functions that actually take ranges and not iterator pairs, the hash_append_sized_range function will however require and use r.size(), instead of taking the iterator distance.

Andrey Semashev

1:09 p.m.

New subject: [hash2] Andrey's review

Here is my review of the Boost.Hash2 library. Sorry for the long read. * Are you knowledgeable about the problem domain? I use hash functions with containers regularly and I have added support for hashing for my types using std::hash and Boost.ContainerHash, but I have little to no knowledge about designing and validating hash functions. I'm definitely not a specialist in cryptography. * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? A couple days of reading the docs, source code, the proposal and some other resources linked there or brought up during discussions. And some time writing toy examples and the review itself. * What is your evaluation of the design, documentation and implementation? I'll comment on the three aspects of the library together because in some instances the three are related. For the implementation, I was reviewing revision 26b16023dc639ae354e246458719e4b569bc1d97 on develop (which was the default on checkout from git when I did the review, and no tag or branch was specified in the review announcement). My general impression of the documentation is that it is sufficient to eventually learn the library, but not easy to grasp for a newcomer. The library doesn't give the user an illustration of how to add support for hashing to his types or how to compute a hash value for an object until late in the documentation. I think, the description of HashAlgorithm could be moved to later chapters of the documentation intended for hash algorithm implementers, and the initial chapters should provide introduction for hash algorithm users. Also, I feel that some chapters could use more detailed explanation. For example, the introduction in Contiguously Hashable Types seems incomplete, as it doesn't explain what is "contiguously hashable" and what happens when the type is not contiguously hashable. I feel that in some cases important details are missing in the text of the less formal initial chapters, but are present in the later reference section. Given that there are no links within the text to the reference sections, and the whole documentation is a giant wall of text, the documentation is very difficult to navigate to learn these details. As I wrote my review while I was reading the docs, there were multiple times when I had to edit my review because I initially thought something was not documented only to find it later mentioned in somewhere in the reference sections. Adding cross-links and more details in the initial informal chapters would be helpful. Also pagination would be very welcome, although I understand it is difficult with Asciidoc. I like the general idea of separated hash functions and support for hashing in types. I think this a step in the right direction for improving support for cryptography in C++. However, the way HashAlgorithm is defined is controversial. In the earlier discussion I've already suggested to rename the HashAlgorithm::result() method to finalize() and Vinnie has agreed to this change in the proposal. The specification also has been relaxed to allow hash algorithms that don't support further update() and finalize() calls after the initial finalize(). In the discussion, Peter has indicated that there is no intention to adopt this, which, I think, significantly limits compatibility with existing hash algorithm implementations. I have posted a short overview of the existing crypto libraries here: https://lists.boost.org/Archives/boost/2024/12/258649.php I understand that the ability to call result() multiple times and interleave result() and update() calls has its uses, but I don't think this feature should be mandatory. First, as noted, the overwhelming majority of the current, optimized and verified implementations don't support it, and making the library incompatible with them complicates the library development and impacts its usefulness to users. Second, the feature itself is not often needed. The primary use case is extending the hash value size, which is a rather niche use case. If a hash value size (or its quality) is not sufficient, I would rather consider a different hash function entirely than try to extend the existing digest, as that would arguably result in better quality hashing. Digest extension is normally only supported by extendable-output hash functions (XOF). Some cryptographic libraries have support for XOF, but again, HashAlgorithm is incompatible with them (or at least, those I checked). For example, in OpenSSL, EVP_DigestFinalXOF is used to finalize the hash calculation and extract the variable-sized digest in one call. EVP_DigestFinalXOF is still required to be called no more than once, no further updates or finalization are allowed. I think, the ability to repeatedly call result() is a poor vehicle to support XOF, and I prefer the OpenSSL approach more. Another feature that is required by HashAlgorithm and not supported by many implementations is seeding the algorithm on construction. Conceptually, seeding is typically understood as initializing the internal state of the hash algorithm with specific non-default values. Seeding is not supported by many hash algorithms, for example SHA-1 and SHA-2, and the conventional workaround is to add salt to the hashed data. I think, HashAlgorithm should allow implementations to use the seed arguments to the hash algorithm constructors as salt, i.e. the constructor would implicitly call update(). I think the basic HashAlgorithm concept should require a minimal set of operations so that existing mainstream implementations of popular non-XOF hash algorithms can be supported. A more advanced concept, say named HashAlgorithmXOF, that supports digest expansion can be defined separately. Something along these lines: struct HashAlgorithmBase { static constexpr size_t block_size = /*...*/; // optional /* * Initializes algorithm to the default state. */ HashAlgorithmBase(); /* * Initializes algorithm to an initial state dependent on bytes * in [seed, seed + n). If n is zero, equivalent to default * construction. * The exact meaning of the seed bytes is implementation-defined. */ HashAlgorithmBase( void const* seed, std::size_t n ); HashAlgorithmBase( HashAlgorithmBase const& r ); // optional HashAlgorithmBase& operator=( HashAlgorithmBase const& r ); // optional HashAlgorithmBase( HashAlgorithmBase&& r ); HashAlgorithmBase& operator=( HashAlgorithmBase&& r ); /* * Feeds algorithm with input data. Can be called multiple times. * Calling update() after the algorithm has been finalized is * undefined behavior, unless otherwise specified by the specific * algorithm. */ void update( void const* data, std::size_t n ); }; struct HashAlgorithm : HashAlgorithmBase { using result_type = /* more on this below */; using HashAlgorithmBase::HashAlgorithmBase; using HashAlgorithmBase::operator=; /* * Finalizes the algorithm and returns the final digest. * Calling finalize() after the algorithm has been finalized is * undefined behavior, unless otherwise specified by the specific * algorithm. */ result_type finalize(); }; struct HashAlgorithmXOF : HashAlgorithmBase { using result_type = /* more on this below */; using HashAlgorithmBase::HashAlgorithmBase; using HashAlgorithmBase::operator=; /* * Finalizes the algorithm and copies the final digest to the buffer * of size `size`, in bytes, pointed to by `digest`. * Calling finalize() after the algorithm has been finalized is * undefined behavior, unless otherwise specified by the specific * algorithm. */ void finalize(unsigned char* digest, std::size_t size); /* * Finalizes the algorithm and returns the final digest of size * `size`. * Calling finalize() after the algorithm has been finalized is * undefined behavior, unless otherwise specified by the specific * algorithm. * * Note: The returned digest is equivalent to the above overload. * The difference is that this overload allocates memory for the * digest instead of using the user-provided storage. */ result_type finalize(std::size_t size); }; Above, I have intentionally omitted the constructor from uint64_t as I find it confusing. In particular, it isn't clear what endianness it uses to seed the algorithm. It looks like it duplicates hash_append. In HashAlgorithmXOF, I provided two overloads of finalize(), where one could be implemented in terms of the other. I'm not particularly attached to this dualism, but if I had to keep just one of the overloads, I would keep the first one, the one that fills the user-provided buffer. It is more flexible, although it is slightly less safe than the other one. However, I'm not insisting on any specific form of finalize() in this case. Ion has already mentioned in his review that HashAlgorithm::block_size should have type size_t, and I agree. At the very least, it should be an unsigned integer. BTW, the same applies to hmac::block_size. Ion also mentioned that hash algorithms should support move construction and move assignment, and I agree. Also, HashAlgorithm currently requires hash algorithms to be copy costructible and copy assignable. It looks like the only place where copyability is required is hash_append_unordered_range. Copyability is a rather strong requirement, which may not be supported by all implementations. This is why I marked copy constructor and copy assignment "optional" in my proposed interfaces above - copyability should only be required by the library algorithms that actually need it; those that don't should still work with non-copyable hash algorithms. The documentation should provide a more detailed description of the HashAlgorithm::result_type type. Specifically, what interface and operations it must support. Ideally, the library should provide a concept, similar to HashAlgorithm. Currently, the documentation only says it can be an unsigned integer type or a "std::array-like" type, where the latter is rather nebulous. From the implementation of get_integral_result it looks like it must support data() member function returning a pointer and that its size() after default-construction is a constant expression and returns at least 8. This, for example, excludes std::vector, which would be useful for HashAlgorithmXOF::result_type. Also on the topic of get_integral_result for "std::array-like" hash values, a few comments: - It requires the static size of 8 regardless of the size of the requested integer, so even if uint32_t is requested it will still require at least std::array<unsigned char, 8>. This should not be necessary. - It ignores the data beyond the initial 8 bytes. It might be ok for the purpose of slicing the hash value, but it is inconsistent with get_integral_result for integers, which attempts to mix bits through multiplications with a constant. I think, the implementation should be consistent between arrays and integers. The approach of universally mixing bits for all hash value types seems safer as it produces better results for hash values with non-uniform distribution of high-quality bits across the hash value. It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing. So I think, get_integral_result and HashAlgorithm::result_type should be better defined by the library. It should be possible for the user to provide his custom type as HashAlgorithm::result_type, and for that (a) there must be a list of requirements for that type to satisfy, and (b) it should be clear how an integer will be derived from it by get_integral_result. Maybe the user should be able to define his own get_integral_result behavior for his digest type. The reason why I think user-defined hash value types are useful is type safety, as it may be desirable to associate the digest with the algorithm that produced it. In the Provided Hash Algorithms section (https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_bytes_provided_ha...), consider documenting the specific classes and headers that implement the algorithms. I appreciate the guidance that you give for whether each of the algorithms is cryptographically strong and for the intended application, this is very useful. It would also be nice if there were some performance estimates for each of the algorithms, as implemented in the library. The hmac_* type aliases are not provided for siphash and xxhash. But I would actually prefer that those type aliases be removed for the rest of the hash algorithms, so that the algorithms don't bring the dependency on hmac.hpp, which is not always needed by the user. Typing hmac<sha2_256> instead of hmac_sha2_256 is hardly any more tedious. In the discussion before the review I asked to provide a member-style API, similar to the free hash_append function, in the tag_invoke's h argument, which is currently the hash algorithm. The idea is to pass a wrapper or a proxy object as that argument; the wrapper would provide the API for appending data to the underlying hash algorithm. This would allow to eliminate the flavor argument that the user has to forward (more on this below) and would allow to reduce the dependency on Boost.Hash2 that is needed to add support for hashing to user-defined classes. Unfortunate dependencies caused by support for hashing and serialization has been a long standing problem within and outside Boost, and it would be prudent not to perpetuate it further. The tag_invoke protocol still requires the hash_append_tag to be declared, but forward-declaring the tag should be much easier and less limiting on Boost.Hash2 future evolution than forward-declaring the hash_append function. I would prefer if the need for that tag was also removed, but presumably that would mean dropping the tag_invoke protocol and using a sufficiently unique name for the customization point function (e.g. boost_hash2_append). I would be fine with that, but I can understand if the authors choose to stick with the tag_invoke protocol and the tag. If there is a way to avoid the forward declaration and still remain compatible with tag_invoke, I would love to hear it, and it should probably be added in the library docs. Also before the review I asked to provide a simple way to integrate types that support Boost.Hash2 with containers that support std::hash protocol. Peter answered that there should be a `hash` utility class template that could be used as a base class for std::hash specialization or, presumably, could be directly specified as the hash function in the container. I can see there is a section in the documentation discussing this (https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects). I think this sort of utility should be provided by the library in a separate header out of the box, and the docs should point the user to it. In the list of types natively supported by hash_append, long double is not listed. From the implementation, it should work if it's 64-bit. Larger representations could be supported, but care must be excersized with padding bytes. The padding could be cleared before hashing by using __builtin_clear_padding (gcc, clang) or __builtin_zero_non_value_bits (MSVC), you can find an example in Boost.Atomic: https://github.com/boostorg/atomic/blob/5e2e01666860178591d2616a71f88181dded... https://github.com/boostorg/atomic/blob/5e2e01666860178591d2616a71f88181dded... https://github.com/boostorg/atomic/blob/5e2e01666860178591d2616a71f88181dded... Also, std::nullptr_t is missing in the list but is actually supported. In that same list, pointers to functions are listed as supported, but not pointers to members. I find this surprising and I don't see why they shouldn't be supported. If this omission is intentional, a rationale in the docs would be helpful. Otherwise, pointers to members should be supported. You may also want to consider adding support for trivially copyable class types, when native byte order is selected in the flavor argument to hash_append. You can clear the padding with one of the intrinsics I mentioned above and then hash the entire object as a single blob of bytes. It looks like this should be allowed under the is_contiguously_hashable optimization. I would like a flavor class template like this: template< endian ByteOrder = endian::native, typename SizeType = std::size_t

...

struct flavor { static constexpr endian byte_order = ByteOrder; using size_type = SizeType; }; using default_flavor = flavor<>; using little_endian_flavor = flavor< endian::little, std::uint64_t >; using big_endian_flavor = flavor< endian::big, std::uint64_t >; // Maybe even this, not sure if it is worth it template< endian ByteOrder = endian::native, typename SizeType = std::size_t

...

inline constexpr flavor< ByteOrder, SizeType > flavor_v{}; This way the user is able to customize the size type without having to define a new flavor struct. In practice, most of the time container sizes do not exceed the capacity of uint32_t (and may in fact be uint32_t or less, with Boost.Container containers), so users might be tempted to specify that type instead of the default. This also changes the default_flavor to use size_t for container sizes, and this is consistent with the default flavor semantics. That is, default_flavor represents the current hardware properties, which are both the byte order and the native size type. For portable hashing, one should use either little_endian_flavor or big_endian_flavor, which should both have a portable size type. In the hash_append function call, the flavor argument is mandatory, and I'm not convinced this is necessary. In the earlier discussion, Peter stated that this is intentional, so that there is less opportunity to forget that argument when it is actually needed. Presumably, the main purpose of this is to ensure portable behavior across machines with different byte order. However, such portability is not always required, and in fact the default flavor (both in name, and in hash_append template parameters) does not offer this portability. The empty brace initializer "{}" that is used for the default flavor also doesn't indicate a particular flavor, other than "whatever the default is, I don't care". Clearly, the library treats one set of parameters as the default in all respects but the requirement to always specify that in hash_append arguments. This doesn't make sense to me. Either make the default flavor optional, or don't offer a default (not in name, not in template parameters) and require users to always specify a specific flavor. My preference would be the former, because default_flavor seems like a sensible default, and, since hash portability is not free in terms of performance, it should be explicitly opted-in by users (i.e. the "don't pay for what you don't use" principle). The second problem is the need to manually forward the flavor argument in the tag_invoke overloads. I think, having to pass around unused arguments is not a good design. This can be mitigated by wrapping the flavor argument in the wrapper/proxy that is passed to the tag_invoke's h argument, something like this: template< typename Hash, typename Flavor > class hash_proxy { public: using hash_type = Hash; using flavor_type = Flavor; private: Hash& h; public: explicit hash_proxy(Hash& h) noexcept : h(h) {} template< typename T > void append(T const& data) { // As if hash_append(h, flavor_type{}, data) here } }; template< typename Flavor, typename Hash, typename T > void hash_append(Hash& h, T const& data) { hash_proxy< Hash, Flavor > hp(h); tag_invoke(hash_append_tag{}, hp, data); } template< typename Hash, typename T > void hash_append(Hash& h, T const& data) { hash_append< default_flavor >(h, data); } class my_class { private: int x; short y; char z; template< typename H > friend void tag_invoke(hash_append_tag, H& h, my_class const& data) { h.append(data.x); h.append(data.y); h.append(data.z); } }; Lastly on this topic, "flavor" doesn't sound like the correct word to me. It sounds like a parameter that controls the hash algorithm specifics, whereas it actually has no relation to the hash algorithm at all. I think, "options" would be a better name for this parameter. * Did you try to use the library? With which compiler(s)? Did you have any problems? Yes, I tried to compile a few toy examples with gcc 13.2. I found that the library does not prefer a user-provided tag_invoke overload when the user's type is also described with Boost.Describe, and the code fails to compile because of overload resolution ambiguity. https://github.com/pdimov/hash2/issues/28 I think, tag_invoke should not cause ambiguities, even if the type matches other categories of types natively supported by the library. If the user defined tag_invoke, he clearly intended this function to implement hashing. I also wrote a simple benchmark to see how the library performance compares to OpenSSL. I also used this program to test how one would wrap OpenSSL primitives into Boost.Hash2 interfaces. The benchmark is here: https://github.com/Lastique/hash2/blob/6bd5dafbcb05417014d0b85336a1245f5f186... Compiled with: g++ -O3 -march=rocketlake -std=c++17 -I../include -I../../boost -o hash2_ossl_perf hash2_ossl_perf.cpp -lssl -lcrypto Here's the output on my machine: sha1_160 (1024 bytes): 1220.093078 MiB/s sha1_160 (1048576 bytes): 1317.424120 MiB/s sha1_160 (16777216 bytes): 1318.296514 MiB/s sha2_256 (1024 bytes): 406.139530 MiB/s sha2_256 (1048576 bytes): 430.141013 MiB/s sha2_256 (16777216 bytes): 429.341694 MiB/s sha2_512 (1024 bytes): 629.593514 MiB/s sha2_512 (1048576 bytes): 712.100583 MiB/s sha2_512 (16777216 bytes): 711.549426 MiB/s openssl_sha1_160 (1024 bytes): 1531.142208 MiB/s openssl_sha1_160 (1048576 bytes): 2759.062140 MiB/s openssl_sha1_160 (16777216 bytes): 2738.579482 MiB/s openssl_sha2_256 (1024 bytes): 1541.291824 MiB/s openssl_sha2_256 (1048576 bytes): 2462.035414 MiB/s openssl_sha2_256 (16777216 bytes): 2449.770143 MiB/s openssl_sha2_512 (1024 bytes): 781.187505 MiB/s openssl_sha2_512 (1048576 bytes): 1086.239840 MiB/s openssl_sha2_512 (16777216 bytes): 1082.922782 MiB/s So, depending on the input message size, OpenSSL is faster from 1.24x to 5.72x. Interestingly, despite what is said in Boost.Hash2 documentation (and confirmed by the test), OpenSSL's implementation of SHA2-256 is faster than SHA2-512. This could be because of the SHA ISA extensions in my CPU. * What is your evaluation of the potential usefulness of the library? Do you already use it in industry? I do not use the library in real projects. I think the separation between the hash functions and the hashed types is a very useful design idea, I like it very much. However, the way HashAlgorithm is defined significantly limits usefulness of the Boost.Hash2 library, as it prevents users from plugging in existing hash algorithm implementations, even if just to experiment. I noted other issues in the review, but I think this one is critical to the library adoption. When I wrote the review, I was torn between "conditional accept" and "reject" votes, as I really do like the general concept the library offers and yet I don't see myself adopting it because of the limitations. But since Peter indicated that he is not willing to address those limitations, I will have to vote REJECT. Besides HashAlgorithm requirements, it looks like the library could use some work anyway. In particular, the mentioned issue with tag_invoke ambiguity should be resolved, and the member-wise API for hash_append needs to be explored. These things may affect the protocol used to add support for hashing to user types, and changes to HashAlgorithm would affect hash algorithms, so these issues need to be sorted out before acceptance and adoption. I hope to see the library improved and submitted for review a second time. Thank you Peter for submitting the library and Matt for managing the review.

Matt Borland

2 p.m.

New subject: [hash2] Andrey's review

...

Here is my review of the Boost.Hash2 library....I will have to vote REJECT.

Andrey, thank you for the extensive review and analysis. Matt

Andrey Semashev

2:13 p.m.

New subject: [hash2] Andrey's review

On 12/10/24 16:09, Andrey Semashev wrote:

...

The documentation should provide a more detailed description of the HashAlgorithm::result_type type. Specifically, what interface and operations it must support. Ideally, the library should provide a concept, similar to HashAlgorithm. Currently, the documentation only says it can be an unsigned integer type or a "std::array-like" type, where the latter is rather nebulous. From the implementation of get_integral_result it looks like it must support data() member function returning a pointer and that its size() after default-construction is a constant expression and returns at least 8. This, for example, excludes std::vector, which would be useful for HashAlgorithmXOF::result_type.

Also on the topic of get_integral_result for "std::array-like" hash values, a few comments:

- It requires the static size of 8 regardless of the size of the requested integer, so even if uint32_t is requested it will still require at least std::array<unsigned char, 8>. This should not be necessary. - It ignores the data beyond the initial 8 bytes. It might be ok for the purpose of slicing the hash value, but it is inconsistent with get_integral_result for integers, which attempts to mix bits through multiplications with a constant. I think, the implementation should be consistent between arrays and integers. The approach of universally mixing bits for all hash value types seems safer as it produces better results for hash values with non-uniform distribution of high-quality bits across the hash value. It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing.

So I think, get_integral_result and HashAlgorithm::result_type should be better defined by the library. It should be possible for the user to provide his custom type as HashAlgorithm::result_type, and for that (a) there must be a list of requirements for that type to satisfy, and (b) it should be clear how an integer will be derived from it by get_integral_result. Maybe the user should be able to define his own get_integral_result behavior for his digest type. The reason why I think user-defined hash value types are useful is type safety, as it may be desirable to associate the digest with the algorithm that produced it.

Also, since I already suggested a few renames in my review, I think renaming result_type to digest_type would make sense. result_type is traditionally used as the result type of a function object, which HashAlgorithm isn't. And the result of a hash function is commonly called a digest.

Peter Dimov

2:18 p.m.

New subject: [hash2] Andrey's review

Andrey Semashev wrote:

...

Above, I have intentionally omitted the constructor from uint64_t as I find it confusing. In particular, it isn't clear what endianness it uses to seed the algorithm. It looks like it duplicates hash_append.

It doesn't use any endianness. It's just used as a seed from which to derive the initial state. Most, if not all, existing non-cryptographic algorithms take integral seeds.

...

The documentation should provide a more detailed description of the HashAlgorithm::result_type type. Specifically, what interface and operations it must support. Ideally, the library should provide a concept, similar to HashAlgorithm. Currently, the documentation only says it can be an unsigned integer type or a "std::array-like" type, where the latter is rather nebulous.

Hash2 started out as a C++03 library, and mandated use of boost::array. Then it became a C++11 library, and mandated use of std::array. But then it became a C++14 constexpr library, and std::array didn't work, so now it mandates the nebulous "std::array-like" type (which is not yet completely nailed down, but will crystallize with practice.)

...

Also on the topic of get_integral_result for "std::array-like" hash values, a few comments:

- It requires the static size of 8 regardless of the size of the requested integer, so even if uint32_t is requested it will still require at least std::array<unsigned char, 8>. This should not be necessary.

I'm not yet clear on the practical utility of using a result_type that is an array of fewer than 8 bytes. We don't have any such examples yet, so it's hard to tell whether this makes any sense and whether it should be explicitly blessed, and to what extent.

...

- It ignores the data beyond the initial 8 bytes. It might be ok for the purpose of slicing the hash value, but it is inconsistent with get_integral_result for integers, which attempts to mix bits through multiplications with a constant.

This is explicitly stated in the documentation of get_integral_result, in the Remarks. In practice it's a useful heuristic to assume that an array-like result implies a hash function of high quality (an avalanching one) so that picking any M bits would suffice. Not making this assumption would make things a bit hairy; it's not quite clear how to turn an arbitrary number of not-high-quality bits into 32. The library tries its best for the integral types, but there the possibilities are at least a finite amount and multiplications are built-in.

...

It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing.

Interesting "meta" approach, but not necessarily better in practice than the current one, under any practical criteria.

...

So I think, get_integral_result and HashAlgorithm::result_type should be better defined by the library.

It most likely will be, eventually, once we have near-100% clarity on how precisely it needs to be defined.

...

The hmac_* type aliases are not provided for siphash and xxhash. But I would actually prefer that those type aliases be removed for the rest of the hash algorithms, so that the algorithms don't bring the dependency on hmac.hpp, which is not always needed by the user. Typing hmac<sha2_256> instead of hmac_sha2_256 is hardly any more tedious.

hmac.hpp is a very light header, so...

...

Thank you Peter for submitting the library and Matt for managing the review.

Thanks, Andrey, for your review.

Andrey Semashev

2:44 p.m.

New subject: [hash2] Andrey's review

On 12/10/24 17:18, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
Above, I have intentionally omitted the constructor from uint64_t as I find it confusing. In particular, it isn't clear what endianness it uses to seed the algorithm. It looks like it duplicates hash_append.

It doesn't use any endianness. It's just used as a seed from which to derive the initial state.

Most, if not all, existing non-cryptographic algorithms take integral seeds.

Yes, but for other algorithms you must imply some byte order, as you call `update(&seed, sizeof(seed))` in the constructor. Otherwise, the algorithm seeded with the same integer would produce different results on machines with different byte orders. I'd say, if a specific hash algorithm supports seeding from an integer, which is different from salting (i.e. simply calling update() on the seed), that algorithm should provide the constructor from an integer. If the constructor is described in HashAlgorithm, at least it should be marked optional.

...

...
Also on the topic of get_integral_result for "std::array-like" hash values, a few comments:

- It requires the static size of 8 regardless of the size of the requested integer, so even if uint32_t is requested it will still require at least std::array<unsigned char, 8>. This should not be necessary.

I'm not yet clear on the practical utility of using a result_type that is an array of fewer than 8 bytes. We don't have any such examples yet, so it's hard to tell whether this makes any sense and whether it should be explicitly blessed, and to what extent.

It's not so much about blessing anything, it's about not requiring what you actually don't need.

...

...
- It ignores the data beyond the initial 8 bytes. It might be ok for the purpose of slicing the hash value, but it is inconsistent with get_integral_result for integers, which attempts to mix bits through multiplications with a constant.

This is explicitly stated in the documentation of get_integral_result, in the Remarks.

In practice it's a useful heuristic to assume that an array-like result implies a hash function of high quality (an avalanching one) so that picking any M bits would suffice.

If you're willing to rely on that the array-like result is of high quality, why not do the same for integer results? I just think the behavior should be consistent, whether the digest is an integer, array or some user-defined type.

...

Not making this assumption would make things a bit hairy; it's not quite clear how to turn an arbitrary number of not-high-quality bits into 32. The library tries its best for the integral types, but there the possibilities are at least a finite amount and multiplications are built-in.

...
It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing.

Interesting "meta" approach, but not necessarily better in practice than the current one, under any practical criteria.

Why? Yes, applying, e.g. fnv1a_32, would be slower than a single multiply or slicing an array, but from the perspective of quality of the reduced digest this approach sounds good to me.

...

...
The hmac_* type aliases are not provided for siphash and xxhash. But I would actually prefer that those type aliases be removed for the rest of the hash algorithms, so that the algorithms don't bring the dependency on hmac.hpp, which is not always needed by the user. Typing hmac<sha2_256> instead of hmac_sha2_256 is hardly any more tedious.

hmac.hpp is a very light header, so...

This may be the case today, and typedefs are there to stay pretty much forever. I say don't introduce them to begin with, they don't bring much value anyway.

Peter Dimov

2:57 p.m.

New subject: [hash2] Andrey's review

Andrey Semashev wrote:

...

...
...
It might be possible to reuse one of the hash algorithms implemented in the library for this bit mixing.

Interesting "meta" approach, but not necessarily better in practice than the current one, under any practical criteria.

Why? Yes, applying, e.g. fnv1a_32, would be slower than a single multiply or slicing an array, but from the perspective of quality of the reduced digest this approach sounds good to me.

Because taking the low 32 bits from a high quality digest is better than taking the FNV-1a hash of all the bits.

Peter Dimov

3:17 p.m.

New subject: [hash2] Andrey's review

Andrey Semashev wrote:

...

...
I'm not yet clear on the practical utility of using a result_type that is an array of fewer than 8 bytes. We don't have any such examples yet, so it's hard to tell whether this makes any sense and whether it should be explicitly blessed, and to what extent.

It's not so much about blessing anything, it's about not requiring what you actually don't need.

What I need today is only part of the story.

Vinnie Falco

2:50 p.m.

New subject: [hash2] Andrey's review

On Tue, Dec 10, 2024 at 5:09 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

...Vinnie has agreed to this change in the proposal

Well, no one needs my permission to deviate from the paper :)

...

Copyability is a rather strong requirement, which may not be supported by all implementations.

Could you please provide an example of a non-copyable HashAlgorithm? I'm struggling to see it. Your review was very well written btw, thanks. Regards

Andrey Semashev

3:05 p.m.

New subject: [hash2] Andrey's review

On 12/10/24 17:50, Vinnie Falco wrote:

...

On Tue, Dec 10, 2024 at 5:09 AM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

Copyability is a rather strong requirement, which may not be supported by all implementations.

Could you please provide an example of a non-copyable HashAlgorithm? I'm struggling to see it.

The primary use case I had in mind is wrapping an external library, where the external library does not support creating a deep copy of the algorithm state. This is the case with libsodium (https://doc.libsodium.org/hashing/generic_hashing), for example. In general, copyability requires support from whatever is being copied, movability does not (as you can always store a pointer and move that). Since the need for HashAlgorithm's copyability is rather limited, I think it makes sense to not require it, if possible.

Andrey Semashev

3:42 p.m.

New subject: [hash2] Andrey's review

On 12/10/24 18:05, Andrey Semashev wrote:

...

On 12/10/24 17:50, Vinnie Falco wrote:

...
On Tue, Dec 10, 2024 at 5:09 AM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

Copyability is a rather strong requirement, which may not be supported by all implementations.

Could you please provide an example of a non-copyable HashAlgorithm? I'm struggling to see it.

The primary use case I had in mind is wrapping an external library, where the external library does not support creating a deep copy of the algorithm state. This is the case with libsodium (https://doc.libsodium.org/hashing/generic_hashing), for example.

I should clarify that libsodium uses structs for storing the state of the hashing algorithms. For example, for SHA2-256 the struct is crypto_hash_sha256_state: https://github.com/jedisct1/libsodium/blob/bfa6ee638673dc88c51331dc13e8f112f... As far as I could see, the documentation makes no guarantees about the contents of this struct and whether it is safe to memcpy it. So, even if the struct can be copied as bytes, I still don't consider copying as properly supported by the library, as one has to rely on implementation details.

Joaquín M López Muñoz

8:12 p.m.

New subject: [hash2] Andrey's review

Peter Dimov

3:35 p.m.

New subject: [hash2] Andrey's review

Andrey Semashev wrote:

...

Here's the output on my machine:

sha1_160 (1024 bytes): 1220.093078 MiB/s sha1_160 (1048576 bytes): 1317.424120 MiB/s sha1_160 (16777216 bytes): 1318.296514 MiB/s sha2_256 (1024 bytes): 406.139530 MiB/s sha2_256 (1048576 bytes): 430.141013 MiB/s sha2_256 (16777216 bytes): 429.341694 MiB/s sha2_512 (1024 bytes): 629.593514 MiB/s sha2_512 (1048576 bytes): 712.100583 MiB/s sha2_512 (16777216 bytes): 711.549426 MiB/s openssl_sha1_160 (1024 bytes): 1531.142208 MiB/s openssl_sha1_160 (1048576 bytes): 2759.062140 MiB/s openssl_sha1_160 (16777216 bytes): 2738.579482 MiB/s openssl_sha2_256 (1024 bytes): 1541.291824 MiB/s openssl_sha2_256 (1048576 bytes): 2462.035414 MiB/s openssl_sha2_256 (16777216 bytes): 2449.770143 MiB/s openssl_sha2_512 (1024 bytes): 781.187505 MiB/s openssl_sha2_512 (1048576 bytes): 1086.239840 MiB/s openssl_sha2_512 (16777216 bytes): 1082.922782 MiB/s

So, depending on the input message size, OpenSSL is faster from 1.24x to 5.72x. Interestingly, despite what is said in Boost.Hash2 documentation (and confirmed by the test), OpenSSL's implementation of SHA2-256 is faster than SHA2-512. This could be because of the SHA ISA extensions in my CPU.

These are interesting timings. Originally I assumed that this is because OpenSSL uses SIMD. But you're right that it might actually use SHA-NI instead. https://en.wikipedia.org/wiki/Intel_SHA_extensions For some reason I thought that SHA-NI only supported SHA1, but it does support SHA2-256 as well. Is there an easy way to check what OpenSSL uses? Probably not because it selects the instruction set at runtime.

Ruben Perez

6:02 p.m.

New subject: [hash2] Andrey's review

On Tue, 10 Dec 2024, 16:35 Peter Dimov via Boost, <boost@lists.boost.org> wrote:

...

...
Here's the output on my machine:

sha1_160 (1024 bytes): 1220.093078 MiB/s sha1_160 (1048576 bytes): 1317.424120 MiB/s sha1_160 (16777216 bytes): 1318.296514 MiB/s sha2_256 (1024 bytes): 406.139530 MiB/s sha2_256 (1048576 bytes): 430.141013 MiB/s sha2_256 (16777216 bytes): 429.341694 MiB/s sha2_512 (1024 bytes): 629.593514 MiB/s sha2_512 (1048576 bytes): 712.100583 MiB/s sha2_512 (16777216 bytes): 711.549426 MiB/s openssl_sha1_160 (1024 bytes): 1531.142208 MiB/s openssl_sha1_160 (1048576 bytes): 2759.062140 MiB/s openssl_sha1_160 (16777216 bytes): 2738.579482 MiB/s openssl_sha2_256 (1024 bytes): 1541.291824 MiB/s openssl_sha2_256 (1048576 bytes): 2462.035414 MiB/s openssl_sha2_256 (16777216 bytes): 2449.770143 MiB/s openssl_sha2_512 (1024 bytes): 781.187505 MiB/s openssl_sha2_512 (1048576 bytes): 1086.239840 MiB/s openssl_sha2_512 (16777216 bytes): 1082.922782 MiB/s

So, depending on the input message size, OpenSSL is faster from 1.24x to 5.72x. Interestingly, despite what is said in Boost.Hash2 documentation (and confirmed by the test), OpenSSL's implementation of SHA2-256 is faster

Andrey Semashev wrote: than

...
SHA2-512. This could be because of the SHA ISA extensions in my CPU.

These are interesting timings. Originally I assumed that this is because OpenSSL uses SIMD. But you're right that it might actually use SHA-NI instead.

https://en.wikipedia.org/wiki/Intel_SHA_extensions

For some reason I thought that SHA-NI only supported SHA1, but it does support SHA2-256 as well.

Is there an easy way to check what OpenSSL uses? Probably not because it selects the instruction set at runtime.

Is it expected to reach similar levels of performance in the future using Hash2? Or is this not in scope for the library? Thanks, Ruben.

Peter Dimov

6:15 p.m.

New subject: [hash2] Andrey's review

Ruben Perez wrote:

...

On Tue, 10 Dec 2024, 16:35 Peter Dimov via Boost, <boost@lists.boost.org <mailto:boost@lists.boost.org> > wrote:

Andrey Semashev wrote:

...
Here's the output on my machine:

sha1_160 (1024 bytes): 1220.093078 MiB/s sha1_160 (1048576 bytes): 1317.424120 MiB/s sha1_160 (16777216 bytes): 1318.296514 MiB/s sha2_256 (1024 bytes): 406.139530 MiB/s sha2_256 (1048576 bytes): 430.141013 MiB/s sha2_256 (16777216 bytes): 429.341694 MiB/s sha2_512 (1024 bytes): 629.593514 MiB/s sha2_512 (1048576 bytes): 712.100583 MiB/s sha2_512 (16777216 bytes): 711.549426 MiB/s openssl_sha1_160 (1024 bytes): 1531.142208 MiB/s openssl_sha1_160 (1048576 bytes): 2759.062140 MiB/s openssl_sha1_160 (16777216 bytes): 2738.579482 MiB/s openssl_sha2_256 (1024 bytes): 1541.291824 MiB/s openssl_sha2_256 (1048576 bytes): 2462.035414 MiB/s openssl_sha2_256 (16777216 bytes): 2449.770143 MiB/s openssl_sha2_512 (1024 bytes): 781.187505 MiB/s openssl_sha2_512 (1048576 bytes): 1086.239840 MiB/s openssl_sha2_512 (16777216 bytes): 1082.922782 MiB/s

So, depending on the input message size, OpenSSL is faster from 1.24x to 5.72x. Interestingly, despite what is said in Boost.Hash2 documentation (and confirmed by the test), OpenSSL's implementation of SHA2-256 is faster than SHA2-512. This could be because of the SHA ISA extensions in my CPU.

These are interesting timings. Originally I assumed that this is because OpenSSL uses SIMD. But you're right that it might actually use SHA-NI instead.

https://en.wikipedia.org/wiki/Intel_SHA_extensions

For some reason I thought that SHA-NI only supported SHA1, but it does support SHA2-256 as well.

Is there an easy way to check what OpenSSL uses? Probably not because it selects the instruction set at runtime.

Is it expected to reach similar levels of performance in the future using Hash2? Or is this not in scope for the library?

I don't see why it wouldn't be in scope; performance optimizations are never ruled out. Taking advantage of SHA-NI when it's statically known to be available (under GCC/Clang -march=native, for instance) should be relatively easy to add. Dynamic dispatch is a bit more convoluted because of constexpr but should be possible.

Andrey Semashev

9 p.m.

New subject: [hash2] Andrey's review

On 12/10/24 21:15, Peter Dimov via Boost wrote:

...

Ruben Perez wrote:

...
Is it expected to reach similar levels of performance in the future using Hash2? Or is this not in scope for the library?

I don't see why it wouldn't be in scope; performance optimizations are never ruled out.

Taking advantage of SHA-NI when it's statically known to be available (under GCC/Clang -march=native, for instance) should be relatively easy to add.

Requiring SHA instructions unconditionally sets a rather high bar on the CPU requirements.

...

Dynamic dispatch is a bit more convoluted because of constexpr but should be possible.

SIMD intrinsics (at least x86 ones) are not constexpr-friendly as they require reinterpret_cast to access memory. Only AVX-512 added load/store intrinsics taking void pointers. Maybe gcc's vector extensions are better, but I'm not sure if SHA intrinsics will work with them. I suspect the constexpr version will have to always use the generic code, and the runtime version may use CPU detection and vectorized code.

Peter Dimov

4:51 p.m.

New subject: [hash2] Andrey's review

Andrey Semashev wrote:

...

I think, HashAlgorithm should allow implementations to use the seed arguments to the hash algorithm constructors as salt, i.e. the constructor would implicitly call update().

No, this is a bad practice and should never be used. As I already said, the algorithm should at minimum include the size of the seed, and then pad to a multiple of the block size. This is not just something I made up, it's existing practice. See for instance how KMAC is defined in https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-185.pdf If not that, you should at minimum include the hash of the seed in the initial prefix, instead of (or in addition to) the seed, like HMAC does. Don't forget to pad to a multiple of the block size.

Andrey Semashev

5:30 p.m.

New subject: [hash2] Andrey's review

On 12/10/24 19:51, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
I think, HashAlgorithm should allow implementations to use the seed arguments to the hash algorithm constructors as salt, i.e. the constructor would implicitly call update().

No, this is a bad practice and should never be used. As I already said, the algorithm should at minimum include the size of the seed, and then pad to a multiple of the block size.

Is this something that should be intrinsically supported by HashAlgorithm, though? Can it be an external component, such as a free function or an adapter like hmac<>? Currently, as I look in the hash algorithms provided by Boost.Hash2, most of them interpret the byte-range seed the way you describe, with the exception of fnv1a and siphash. fnv1a doesn't append the seed size (and doesn't pad as FNV-1a is not a block hash); I'm not sure if that's intentional. siphash has special treatment of the seed if it is exactly 8 bytes. Given these differences, is the user able to have any sort of guarantee on consistent behavior of the constructors of various algorithms? If not, why provide the constructor at all? I think, it would be better if the library provided a standalone component that would seed any hash algorithm (whether implemented by Boost.Hash2 or not) in a consistent and secure way.

Peter Dimov

5:42 p.m.

New subject: [hash2] Andrey's review

Andrey Semashev wrote:

...

Currently, as I look in the hash algorithms provided by Boost.Hash2, most of them interpret the byte-range seed the way you describe, with the exception of fnv1a and siphash. fnv1a doesn't append the seed size (and doesn't pad as FNV-1a is not a block hash); I'm not sure if that's intentional.

I suppose it would be better for it to do so, yes.

...

siphash has special treatment of the seed if it is exactly 8 bytes.

That's because this specific handling of the key is part of its specification, as siphash is a keyed hash function. So if you use siphash as specified, using the key length that it mandates, you get its documented (in siphash.pdf) behavior and it passes the corresponding siphash test suite.

...

Given these differences, is the user able to have any sort of guarantee on consistent behavior of the constructors of various algorithms? If not, why provide the constructor at all?

The constructor does what it's specified to do in the hash algorithm requirements. Yes, the user can rely on that.

Tom Kent

11 Dec 11 Dec

1:55 a.m.

On Sat, Dec 7, 2024 at 7:41 AM Matt Borland via Boost <boost@lists.boost.org> wrote:

...

Dear all,

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

Please provide feedback on the following general topics:

- What is your evaluation of the design?

This looks to be a good API and hits the goals desired. Seems like it would be simple and nice to use.

...

- What is your evaluation of the implementation?

- What is your evaluation of the documentation?

...

- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?

There are several hashes given in this library that are used in security critical scenarios (e.g. SHA-2). Additionally, the explicit goal of the library is to give the user tools they need to build their own hashes. I do not see any indication that any kind of cryptographic assessment was done on this library. There is not even a mention that the security of these hashes has been thought about nor in what scenarios they could be used. This is absolutely essential for any library providing cryptographic primitives. There are a *lot* of potential pitfalls in the implementation of a cryptographic primitive, even in cases where the "correctness" of the end result aren't in question. For example, there have been timing attacks against SHA-2/HMAC where the difference in the amount of time processing takes can leak information about the secret key. https://dl.acm.org/doi/10.1007/978-3-030-89915-8_2 These side-channel type attacks can be extremely insidious, and are not things that people who are otherwise experts in C++/C/Assembly/etc would ever think about. For cryptography, we need a different type of expert than the type we typically grow in this community. **If Boost wants to take on providing cryptographic primitives, we need to hold those libraries to a higher standard, including an evaluation by outside cryptopgophers before we release anything to the public.** I would be much happier if the choice was made to use a thoroughly vetted library (e.g. OpenSSL, but more compact options would probably be better) behind the scenes, with a nice C++ wrapper on the front. I am tempted to suggest an option where we put big banners on the library saying "not for use in scenarios that require cryptographic security", but there is simply too much danger in users not understanding *when* they require security (with dozens of examples of this happening in the literature) that they would accidentally use this library in scenarios that undermine their security.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems?

I did not.

...

- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

Spent a good amount of time reading the documentation. Browsed through some of the code.

...

- Are you knowledgeable about the problem domain?

Moderately, work in a security critical industry. Ensure to explicitly include with your review: ACCEPT, REJECT, or

...

CONDITIONAL ACCEPT (with acceptance conditions).

REJECT Sincerely, Thomas Kent

Matt Borland

2:09 p.m.

...

REJECT

...

Sincerely, Thomas Kent

Thank you Thomas for your review and concerns. Matt

Peter Dimov

5:43 p.m.

Tom Kent wrote:

...

For example, there have been timing attacks against SHA-2/HMAC where the difference in the amount of time processing takes can leak information about the secret key. https://dl.acm.org/doi/10.1007/978-3-030-89915-8_2

This is an attack against a hardware implementation of HMAC-SHA2-256. The paper doesn't seem to be freely available, but I _think_ that it relies on data leaks via changes in power consumption. This doesn't seem applicable to software implementations, and I wasn't able to find attacks against software HMAC-SHA2-256. (Admittedly, after a not that exhaustive search attempt.) Either way, if you think that not having a SHA-2 implementation in Boost will increase the resistance of the collective body of C++ code against side channel attacks, I can only say that I believe you are very much mistaken.

Ruben Perez

12 Dec 12 Dec

11:15 a.m.

On Wed, 11 Dec 2024, 02:55 Tom Kent via Boost, <boost@lists.boost.org> wrote:

...

There are several hashes given in this library that are used in security critical scenarios (e.g. SHA-2). Additionally, the explicit goal of the library is to give the user tools they need to build their own hashes.

I do not see any indication that any kind of cryptographic assessment was done on this library. There is not even a mention that the security of these hashes has been thought about nor in what scenarios they could be used. This is absolutely essential for any library providing cryptographic primitives. There are a *lot* of potential pitfalls in the implementation of a cryptographic primitive, even in cases where the "correctness" of the end result aren't in question.

For example, there have been timing attacks against SHA-2/HMAC where the difference in the amount of time processing takes can leak information about the secret key. https://dl.acm.org/doi/10.1007/978-3-030-89915-8_2

These side-channel type attacks can be extremely insidious, and are not things that people who are otherwise experts in C++/C/Assembly/etc would ever think about. For cryptography, we need a different type of expert than the type we typically grow in this community.

**If Boost wants to take on providing cryptographic primitives, we need to hold those libraries to a higher standard, including an evaluation by outside cryptopgophers before we release anything to the public.**

As a potential user mainly interested in the "hashing untyped byte sequences" use case (involving SHA2), do you think migrating from OpenSSL to Boost.Hash2 would be detrimental for security at this point? If the answer is yes, is there a way to remediate this (even after the library gets accepted)? Or is this just not the main use case of the library? The use case involves generating digests for a network protocol (MySQL). I'd like to know both Tom's and Peter's opinions. Thanks, Ruben.

Peter Dimov

1:27 p.m.

Ruben Perez wrote:

...

As a potential user mainly interested in the "hashing untyped byte sequences" use case (involving SHA2), do you think migrating from OpenSSL to Boost.Hash2 would be detrimental for security at this point? If the answer is yes, is there a way to remediate this (even after the library gets accepted)? Or is this just not the main use case of the library?

The use case involves generating digests for a network protocol (MySQL).

I'd like to know both Tom's and Peter's opinions.

Can you please point me to the source code portions in Boost.MySQL that implement SHA-2 authentication?

Ruben Perez

8:01 p.m.

On Thu, 12 Dec 2024 at 14:27, Peter Dimov <pdimov@gmail.com> wrote:

...

Ruben Perez wrote:

...
As a potential user mainly interested in the "hashing untyped byte sequences" use case (involving SHA2), do you think migrating from OpenSSL to Boost.Hash2 would be detrimental for security at this point? If the answer is yes, is there a way to remediate this (even after the library gets accepted)? Or is this just not the main use case of the library?

The use case involves generating digests for a network protocol (MySQL).

I'd like to know both Tom's and Peter's opinions.

Can you please point me to the source code portions in Boost.MySQL that implement SHA-2 authentication?

Current code (using OpenSSL): https://github.com/boostorg/mysql/blob/c438f26731e36c2db6457705ec5dbb9f7657d... Code using the proposed library: https://github.com/boostorg/mysql/pull/389/files#diff-1ce941e5f315c38f0eb53e... Protocol docs: https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_caching_sha2_authentic... It's somehow similar in spirit to SCRAM-SHA256, but built in-house by MySQL.

Peter Dimov

8:17 p.m.

Ruben Perez wrote:

...

On Thu, 12 Dec 2024 at 14:27, Peter Dimov <pdimov@gmail.com> wrote:

...
Ruben Perez wrote:

...
As a potential user mainly interested in the "hashing untyped byte

sequences"

...
...
use case (involving SHA2), do you think migrating from OpenSSL to Boost.Hash2 would be detrimental for security at this point? If the answer is yes, is there a way to remediate this (even after the library gets accepted)? Or is this just not the main use case of the library?

The use case involves generating digests for a network protocol (MySQL).

I'd like to know both Tom's and Peter's opinions.

Can you please point me to the source code portions in Boost.MySQL that implement SHA-2 authentication?

Current code (using OpenSSL): https://github.com/boostorg/mysql/blob/c438f26731e36c2db6457705ec5dbb9f7657d... Code using the proposed library: https://github.com/boostorg/mysql/pull/389/files#diff-1ce941e5f315c38f0eb53e... Protocol docs: https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_caching_sha2_authentic...

It's somehow similar in spirit to SCRAM-SHA256, but built in-house by MySQL.

In this specific case, if we assume that Hash2 is accepted into Boost, I'd say that using OpenSSL is much more susceptible to supply chain attacks. The user acquires both Boost.MySQL and Boost.Hash2 through Boost, whereas the typical practice of acquiring OpenSSL under Windows until very recently was "web search and download random binaries from somewhere on the Internet." Things are probably much better today because of vcpkg and conan, but the exact version of OpenSSL that the user will end up using is still an unknown variable. The actual SHA256 implementation in OpenSSL has been looked at much more than the one in Hash2 at this point, but the Hash2 code is easy to inspect and verify because it follows the reference implementation very closely at the moment (although this might change if we add SHA-NI optimizations in the future.) (I ignore here the C-style interface of OpenSSL's function, which has its own safety implications, but let's assume that the code in Boost.MySQL is 100% correct.)

Matt Borland

8:34 p.m.

...

The actual SHA256 implementation in OpenSSL has been looked at much more than the one in Hash2 at this point, but the Hash2 code is easy to inspect and verify because it follows the reference implementation very closely at the moment (although this might change if we add SHA-NI optimizations in the future.)

OpenSSL's SHA implementations (SHS and SHA3) are also certified under FIPS 140-2 [1] if this is a relevant property for your users Reuben. Matt [1] https://csrc.nist.gov/projects/cryptographic-module-validation-program/certi...

Ruben Perez

9:12 p.m.

On Thu, 12 Dec 2024, 21:34 Matt Borland, <matt@mattborland.com> wrote:

...

...
The actual SHA256 implementation in OpenSSL has been looked at much more than the one in Hash2 at this point, but the Hash2 code is easy to inspect and verify because it follows the reference implementation very closely at the moment (although this might change if we add SHA-NI optimizations in the future.)

OpenSSL's SHA implementations (SHS and SHA3) are also certified under FIPS 140-2 [1] if this is a relevant property for your users Reuben.

Matt

[1] https://csrc.nist.gov/projects/cryptographic-module-validation-program/certi...

Thanks for the info.

Ruben Perez

9:07 p.m.

On Thu, 12 Dec 2024, 21:17 Peter Dimov, <pdimov@gmail.com> wrote:

...

...
On Thu, 12 Dec 2024 at 14:27, Peter Dimov <pdimov@gmail.com> wrote:

...
Ruben Perez wrote:

...
As a potential user mainly interested in the "hashing untyped byte

sequences"

...
...
use case (involving SHA2), do you think migrating from OpenSSL to Boost.Hash2 would be detrimental for security at this point? If the answer is yes, is there a way to remediate this (even after the library gets accepted)? Or is this just not the main use case of the

Ruben Perez wrote: library?

...
...
...
The use case involves generating digests for a network protocol

(MySQL).

...
I'd like to know both Tom's and Peter's opinions.

Can you please point me to the source code portions in Boost.MySQL that implement SHA-2 authentication?

Current code (using OpenSSL):

https://github.com/boostorg/mysql/blob/c438f26731e36c2db6457705ec5dbb9f7657d...

...
Code using the proposed library:

https://github.com/boostorg/mysql/pull/389/files#diff-1ce941e5f315c38f0eb53e...

...
Protocol docs: https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_caching_sha2_authentic...

It's somehow similar in spirit to SCRAM-SHA256, but built in-house by MySQL.

In this specific case, if we assume that Hash2 is accepted into Boost, I'd say that using OpenSSL is much more susceptible to supply chain attacks.

The user acquires both Boost.MySQL and Boost.Hash2 through Boost, whereas the typical practice of acquiring OpenSSL under Windows until very recently was "web search and download random binaries from somewhere on the Internet."

Things are probably much better today because of vcpkg and conan, but the exact version of OpenSSL that the user will end up using is still an unknown variable.

The actual SHA256 implementation in OpenSSL has been looked at much more than the one in Hash2 at this point, but the Hash2 code is easy to inspect and verify because it follows the reference implementation very closely at the moment (although this might change if we add SHA-NI optimizations in the future.)

(I ignore here the C-style interface of OpenSSL's function, which has its own safety implications, but let's assume that the code in Boost.MySQL is 100% correct.)

Thanks for the clarifications, Peter.

...

Matt Borland

11 Dec 11 Dec

2:15 p.m.

...

Dear all,

...

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

...

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

...

Please provide feedback on the following general topics:

...

- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

...

There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

...

Matt Borland Review Manager and C++Alliance Staff Engineer

All, We are now halfway through the review period, so if you are going to submit a review you only have a few days remaining. There's been a lot of good discussion which I'm sure Peter and Christian appreciate. Matt

Alexander Grund

4:03 p.m.

My review (a bit longer than intended)

...

- What is your evaluation of the design? The library is based on the paper which goes into great lengths discussing the design. To me this sounds well reasoned so the library design is solid. - What is your evaluation of the implementation? The whole code base looks very clean and well structured. The implementation of the provided hash functions follows their specification making them easy to verify. Reasonable use of trait-types and C++11/14 features and SFINAE-friendly methods make it look solid to me.

...

- What is your evaluation of the documentation? Well written and structured. However I find it odd, that it seems to emphasize the provided hashing algorithms and still calls itself "An Extensible Hashing Framework" For me it also dives too soon too deep into details. I'd like to have

The only point of criticism I have is the type used for "byte-ranges" in the HashAlgorithm constructor and update method(s). In the (seed-)ctor the bytes are of type `unsigned char const` while in `update` the are of type `void const`. If both methods are supposed to receive a byte sequence I'd expect them to use the same type and am confused by the difference. Given that the constexpr-enable concept uses `unsigned char const` I'd choose that. Use of a span-like type has been extensively discussed and I'll just mention my personal key takeaway: In the context of static analysis through compiler warnings I'd like to see support for both `update(ptr, size)` & `update(byte-span)` in the concept to give HashAlgorithm implementers the freedom to not provide a signature that triggers compiler warnings. Support for a span-like type at least constructible from a `unsigned char const*` and size to be passed to the algorithms update function seems reasonable to me. However I do see the advantage of the more flexible approach currently implemented. Especially due to the multitude of span classes available. So to me this is optional and at most a QoI improvement. the distinction more clear between: a) hashing (raw) byte sequences by using a HashAlgorithm b) hashing any C++ types/objects by using the framework As a (potential) user I'd really like to be told first how to use this framework with something existing (already implemented hash algorithm class) than having an in-depth explanation of the (class-)concepts and of a multitude of provided algorithms, a simple list with links to details at this point (the introduction) might suffice. To me the "major use case" the library addresses is not "hashing an untyped sequence of bytes". If I understood it correctly the algorithms provided are "just" examples of algorithms the framework can be used with. There are already implementations of such algorithms and some are seemingly much faster (compare the OpenSSL benchmarks posted on the list). The novelty to me here is a framework to allow the separation of types and hash algorithms, i.e. enable types to be hashed without knowing by which algorithm. So maybe that should be more prominent in the docs. I also find the description of the `result()` function questionable:

...

Note that result is non-const, because it changes the internal state. [...] subsequent calls [...] produce a pseudorandom sequence

Having this in the description of the HashAlgorithm concept seems to exclude any algorithm that doesn't has a meaningful "finalize" operation. Or those that "finalize" e.g. by padding and hence subsequent "finalize" calls are no-ops yielding the same result. If it isn't an intentional requirement for all algorithms to be used by this, I'd suggest to change the wording to e.g. "may change internal state." and "Some might producea pseudorandom sequence" or similar.

...

- What is your evaluation of the potential usefulness of the library? Do you already use it in industry? Very useful, especially as it is an implementation of a paper for a standards proposal which I think is very much in line of what Boost is/was/should be. I don't use it yet.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems? Tried examples using GCC 13, no issues. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? Something short of an in-depth study - Are you knowledgeable about the problem domain? Somewhat, yes. Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions). CONDITIONAL ACCEPT I'd like to suggest to reconsider the concept of the `result` function (like the split into `result() const` and `finalize()` mentioned on the

In the context of cryptographic security mentioned by others I'd like to see a) an example how to wrap an existing, 3rd-party hash implementation (which might only have a C interface) into the HashAlgorithm Concept b) at least a statement on the security of the provided cryptographic hash algorithms in the documentation. list) or at least the wording, and the types for "byte-ranges" mentioned above unless changing that type severely affects the functionality (negatively). Having the concept defined on inclusion might make it hard to change later on when people rely on it so I'd rather like to have this "right" from the start, if in doubt be more permissive. This "condition" is intentionally very weak as I do like to see this library include, and can be met by simply stating that the values returned by multiple invocations of `result` might be the same.

Matt Borland

4:12 p.m.

...

My review (a bit longer than intended) CONDITIONAL ACCEPT

Alex, thank you for your review; your conditions are noted. Matt

Darryl Green

4:38 p.m.

...

- What is your evaluation of the design? Please see the documentation section - most of the comments there are about the design at least as much as how it is documented.

...

- What is your evaluation of the implementation? The implementation is unsurprisingly of high quality.

...

- What is your evaluation of the documentation?

The documentation is very comprehensive. Sufficiently so that it called out all the surprising features of what I'd expected was going to simply be a refined (given the author) implementation of the approach described in the paper "types don't know #". The documentation describes hash algorithm requirements early. I expected to be unsurprised and initially was (vs the hash algorithm requirements in the paper) but the fact that every hash algorithm must have a constructor taking an integer seed and one taking a byte sequence immediately caught my attention. I didn't expect that all of the listed hash algorithms would have either or both of those constructors. So I checked the reference docs. for the algorithms to find out what this requirement has added to the "usual" expectations for each algorithm. FNV-1a looked like a predictable, minimal "how to seed something that isn't a keyed hash i.e. just feed it a "key" as a prefix (by calling update). That did make me wonder why the concept was complicated with the constructors that aren't doing anything the user couldn't do post-construction by simply calling update(). But - xxhash algorithm specification (as opposed to the hash2 library) does define a (result-type sized) seed and sure enough - hash2 provides a constructor that operates as expected (with a twist that the seed is not of result_type so can be larger than seed the 32 bit algorithm supports). This discrepancy and the requirement that different seeds will give different results forces larger seed values to be used creatively. The approach taken is to, if the seed was not small enough to be converted to result type without overflow to use the low 32 bits to do the defined seeding then mix the high 32 bits into the state "somehow" (I checked the implementation, its as if the high 32 bits were copied to unsigned char b[4] then update(b, 4) called 4 times - I think? So - what happens when constructing from a byte sequence? In this case we fallback to default construction followed by update() - oh - the docs say "update() and a call to result(). That looks odd? Why would one call result()? After calling result() further update() calls are valid? Read some more docs. Oh - every hash algorithm is required to not only have these constructors, but also to be an extensible hash. Another surprise. It turns out, result() having a side effect of changing the state is required/expected. And being used here to "mix" the seed. I'm not 100% sure why - not that it's bad - but just how arbitrary is each algorithms seeding behaviour? Is it reasonable to be creating these seeding conventions specific to the library? The obvious answer is - if you don't like it, write a different algorithm implementation. It is true that for the purposes of generating hashes of unordered container keys any "reasonable" seeding convention is good enough - even where hash flooding protection is needed. Perhaps using the library to hash arbitrary C++ objects for any/all uses is getting into an area where the user needs to be a bit more aware of potential interoperability issues. Even so, the algorithm specification for each provided algorithm should be very explicit about just how it seeds - and that there is no interoperability of these features with any other implementation of the "same" algorithm. Unless of course there is - If an algorithm in hash2 follows a specification for seeding and/or extension this should be clearly documented/referenced. An implementation of https://keccak.team/kangarootwelve.html would make an interesting addition due to it having a comprehensive set of functions that might be expected to directly fulfil the concepts (among other nice features).

...

- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?

I haven't used it yet. But I've used a hash algorithms for various things. Some cryptographic, some purely for high performance (relative to resources) unordered containers / data structures. I would definitely use this library for unordered key hashing, and the convenience of creating a hash for a user defined type independent of the algorithm choice is a valuable capability and the reason I haven't had anything muich to say about that aspect of the library is - I don't have any complaints or suggestions for improvement, except that I think that support for hashing unordered containers should be dropped. because there isn't a robust and performant generic solution (it seems - if there is, it's not in the library). I would think twice before using the library for any cryptographic purpose not because of any specific issues (other than the unordered container issue) beyond it being yet another implementation of the raw hash algorithm without any specific benefits and with a restrictive (for good reason) interface tailored to the uses envisioned in the paper.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems?

I compiled and ran some tests using g++ 14.2 - It was very easy to do this. Modular boost is great!

...

- How much effort did you put into your evaluation?

A few hours.

...

A glance? A quick reading? In-depth study?

I took a glance, mistook it for a nice simple library to make unordered containers easier to use with arbitrary key types, and though I wouldn't dig deeper and probably didn't have a lot to add beyond voting to accept. But the extended scope of the algorithm concept, and also the aspirations to solve the problem of hashing multisets / unordered containers as key type made me take a bit more notice. I don't think the generalisations of the hash algorithm concept do any harm, but the potential for surprise and the potential to expect that the hash algorithms so defined/encapsulated should be used for "everything" is there. The ergonomics are a bit quirky - I do wonder if adopting names that are a bit more descriptive of the hash algorithm operation (if we were talking only about sponges Id say absorb(), squeeze() in place of update() and digest() ) might be in order to show just how different the algorithm concepts are, but I'm not sure that every hash algorithm should provide the latter. Perhaps it is more that everything should provide update(), an extensible hash should provide "stir()" and everything should provide a (const) result(). Surely the few places where result is called multiple times with the expectation that the state was changed can/should make this explicit? Im also not sure I like result types as array-like byte sequences or integers but perhaps optimisers can't quite make that choice irrelevant from a performance perspective. Im not sure the library should even try to solve the problem of extracting "good" bits from a hash that doesn't have great dispersion properties. There isn't anything "wrong" with a fast and less than uniform and certainly not preimage resistant hash function - there is something wrong with trying to "make it work" where it shouldn be being used in the first place.

...

- Are you knowledgeable about the problem domain? I know enough to be dangerous. And know enough to know I am (dangerous). This library is mostly a way to avoid footgun problems while easing the task for the average knowledge user like me.

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or

CONDITIONAL ACCEPT (with acceptance conditions). I vote to ACCEPT the library. I don't think adding explicit conditions is necessary as I expect the author will address any issues.

Matt Borland

4:54 p.m.

On Wednesday, December 11th, 2024 at 11:38 AM, Darryl Green <darryl.green@gmail.com> wrote:

...

I vote to ACCEPT the library. I don't think adding explicit conditions is necessary as I expect the author will address any issues.

Thank you for the review Darryl. Matt

Samuel Neves

12 Dec 12 Dec

6:31 p.m.

On Sat, Dec 7, 2024 at 1:47 PM Matt Borland via Boost <boost@lists.boost.org> wrote:

...

Dear all,

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

Since I've already barged in the topic earlier, I might as well leave a review.

...

Please provide feedback on the following general topics:

- What is your evaluation of the design?

One of the things that puzzles me about this library is its intended purpose. "Types don't know #" was clearly about hash types for use in hash tables, which explained why keying the hashes was mandatory. But it is unclear to me why MD5, SHA-1, and other cryptographic hashes are present here; they are way too inefficient to be of much use in hash tables, so there is little reason to include them. If, on the other hand, this is meant as a more general hashing library, I have strong objections as detailed below. I believe the HashAlgorithm concept is too overloaded. It _requires_ that hashes be both keyable and extendable-output, which are things that many common hash functions are not capable of. MD5, for example, is not capable of either keying or extendable-output, and as such will require an ad hoc _mode of operation_ to achieve these goals, which is fraught with danger. A user that wants to use this library with a hash that is not included here is going to be pushed to include functionality the hash does not provide and that they might not know how to achieve. Despite being very useful functionality, this is a source of vulnerabilities waiting to happen. Additionally, one of the constructors for keying is std::uint64_t. While this is mostly OK for the hash table DoS-prevention use case, it is woefully small for cryptographic purposes. This constructor should not exist in this case. Furthermore, the multiple result() calls API to achieve extendable output, besides being confusing and incompatible with just about every implementation out there, is suboptimal from a performance standpoint. If I want to generate a long output, calling result() repeatedly will make vectorization much more annoying to implement (and requires an unnecessary large internal buffer) than an API such as digest(output, length). This is the same rationale for proposals such as P1068 for more efficient output on the standard <random> generators. I would suggest splitting HashAlgorithm concepts into KeyedHashAlgorithm/PRF and FixedOutputAlgorithm/VariableOutputAlgorithm, or something along these lines. It should not be necessary to shoehorn this functionality into primitives that do not support it. Note that this is not a clean hierarchy; some keyed algorithms cannot be safely used as unkeyed hashes, so KeyedHashAlgorithm is not a subset of HashAlgorithm and vice-versa.

...

- What is your evaluation of the implementation?

The implementation of the cryptographic hash functions highlight the issues with the above design. I will talk about MD5 here, but it applies all the same to SHA-1, SHA-256, etc. Firstly, it is strange that an implementation of MD5 is keyed, since MD5 does not provide this functionality. But the keying here simply prepends the key, padded to block length, to the message. This enables the so-called length-extension attack: if I have MD5(k || m), I can easily compute MD5(k || m || other stuff) without knowing the key, which can be used to bypass authenticators (Flickr was a famous example of this [1]). This is why HMAC is has been drilled into developers whenever keyed hashing comes up. Furthermore, the way extended output works breaks pseudorandomness. Suppose I have a keyed MD5 instance and want to generate several blocks of output. The expectation here is that _all_ of the output is indistinguishable from a random string of the same length. But that is not the case here. What we have instead is first_block = MD5(k || m || padding) second_block = MD5(k || m || padding || more padding) ... An attacker who observes this can easily distinguish this by taking first_block, which consists of the internal MD5 state, hashing the extra padding, and checking whether the output is equal to second_block. If I were to use this scheme as a stream cipher to encrypt data, knowing the first 16 bytes of plaintext (e.g., a known/predictable header) would immediately enable the recovery of the rest. This is broken as an XOF. The way variable-length output in Merkle-Damgard hashes such as MD5 historically has been handled is to use MGF1 [2, 10.2.1] or something along these lines. But once again this is a _mode of operation_ on top of MD5, similarly to HMAC. I have already mentioned this elsewhere, but the unordered hash approach currently implemented cannot possibly be secure without enormous parameters. A minor question I have regards the constants used in get_result_multiplier(). The documentation states that this is used to produce a uniform output on the target range, but it's unclear from the documentation or source code the rationale or criteria for the method or constants used here. This seems to get into integer hash territory, but for example the 4 -> 1 case consists of x -> (x*0x7f7f7f7f) >> 24, for which easy differentials exist, e.g., (x, x^0xc0800000) collide with probability ~1/4. The implementations are otherwise clean and easy to read, which simplified spotting the issues above. [1] https://vnhacker.blogspot.com/2009/09/flickrs-api-signature-forgery.html [2] https://www.ietf.org/rfc/rfc2437.txt

...

- What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry?

A framework to easily hash custom types without having to repeat the same std::hash or similar boilerplate over and over is something I would definitely use, and have sort of reinvented kludgy versions of it multiple times before. But apart from easily enabling types to be usable in hash tables I would not have much use for this library.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems?

Yes. Clang 19 on Linux. It worked as expected.

...

- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

A few hours. Mostly skimming the source code and documentation.

...

- Are you knowledgeable about the problem domain?

I have been involved in the design and analysis of hash functions.

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

I would rather leave this field empty, since I'm not too familiar with the procedure here, but if pressed I would say reject in its current state.

Matt Borland

6:45 p.m.

...

...
Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

...

I would rather leave this field empty, since I'm not too familiar with the procedure here, but if pressed I would say reject in its current state.

Samuel, There is no particular procedure with casting your vote besides emailing this list. It is purely based on your analysis of the library do you think it should be accepted, rejected, or accepted at a later time once certain conditions have been fulfilled? As review manager I compile all the votes, analysis, and will provide a final ruling typically within a few days of the review period closing. For further information: https://www.boost.org/community/reviews.html. Please let me know if you have any additional questions. Thank you for your time and review. Matt

Vinnie Falco

9:37 p.m.

On Thu, Dec 12, 2024 at 10:45 AM Matt Borland via Boost < boost@lists.boost.org> wrote:

...

I compile all the votes

Reviews are not votes. Thanks

Peter Dimov

6:48 p.m.

Samuel Neves wrote:

...

Furthermore, the way extended output works breaks pseudorandomness. Suppose I have a keyed MD5 instance and want to generate several blocks of output. The expectation here is that _all_ of the output is indistinguishable from a random string of the same length. But that is not the case here. What we have instead is

first_block = MD5(k || m || padding) second_block = MD5(k || m || padding || more padding) ...

An attacker who observes this can easily distinguish this by taking first_block, which consists of the internal MD5 state, hashing the extra padding, and checking whether the output is equal to second_block.

This only works if the message length is known to the attacker, because the "more padding" part includes the message length, which is incremented by each call to result().

Peter Dimov

6:59 p.m.

Samuel Neves wrote:

...

A minor question I have regards the constants used in get_result_multiplier(). The documentation states that this is used to produce a uniform output on the target range, but it's unclear from the documentation or source code the rationale or criteria for the method or constants used here. This seems to get into integer hash territory, but for example the 4 -> 1 case consists of x -> (x*0x7f7f7f7f) >> 24, for which easy differentials exist, e.g., (x, x^0xc0800000) collide with probability ~1/4.

It's pretty hard to come up with "good" multipliers here because it's not possible to quantify the "good" part. There are basically two cases, one where the 32 bit input is uniformly distributed, and one in which it isn't. And when it isn't, that is, it comes from a "bad" hash, it's not really possible to optimize the multiplier without a known input distribution, and we don't have that. So I came up with some ad hoc criteria here https://github.com/pdimov/hash2/blob/develop/test/get_integral_result_4.cpp and tried to make the multipliers work for them.

Peter Dimov

7:24 p.m.

Samuel Neves wrote:

...

Additionally, one of the constructors for keying is std::uint64_t. While this is mostly OK for the hash table DoS-prevention use case, it is woefully small for cryptographic purposes. This constructor should not exist in this case.

While it's trivially true that uint64_t is not enough seed for cryptographic purposes, MD5 seeded with an unknown uint64_t seed is still incrementally much more cryptographic than plain MD5. The use case here is generic code. The user has something templated on the hash algorithm that is currently using e.g. xxhash_64 seeded with uint64_t. But this turns out to not work well in practice because collisions happen. So they just change the template parameter from xxhash_64 to md5_128 and carry on. For real crypto, you'd have the code using HMAC-SHA2-256, and while hmac_sha2_256 doesn't much need the uint64_t constructor, I very much doubt that anyone would use it by mistake. The rationale for adding a byte seed constructor to everything, even though the functions aren't necessarily designed for it, is the same; a hash function seeded with something unknown to the attacker is incrementally much more secure than the same hash function not seeded with anything.

Alexander Grund

13 Dec 13 Dec

8:20 a.m.

...

One of the things that puzzles me about this library is its intended purpose. "Types don't know #" was clearly about hash types for use in hash tables, which explained why keying the hashes was mandatory. But it is unclear to me why MD5, SHA-1, and other cryptographic hashes are present here; [...] I believe the HashAlgorithm concept is too overloaded. It _requires_ that hashes be both keyable and extendable-output, [...] A user that wants to use this library with a hash that is not included here is going to be pushed to include functionality the hash does not provide and that they might not know how to achieve.

That is actually a good point: Why are those requirements on the HashAlgorithm concept at all? The library is (mostly) about allowing to hash types, i.e. the framework surrounding `hash_append`. I understand that a specific way (currently being discussed here on the list) of construction is required for implementing `hash_append` for unordered containers. But besides that, why does the library require specific constructors at all? The HashAlgorithm is provided by the user as an instance. How the users constructed that isn't relevant for the purpose of the library, is it? To my understanding having support for seeds could be highlighted as best practice for specific use cases, but not as a general requirement. I understand that the library provides ways to extract a hash out of an Algorithm. So including `result_type` and `result` makes sense, although given that this is C++11 the former seems a bit outdated as it can be determined by the latter. But besides that there are no requirements on HashAlgorithm relied upon by the library, are there? For user convenience I can see `SeedableHashAlgorithm` as a subset of `HashAlgorithm` as a useful, additional concept to include in the library. The library could provide traits to check a given type for a specific concept (when lacking C++20 concepts) This would give users and hash authors more freedom in which algorithm is used. Of course it could be stated, that all included algorithms conform to the `SeedableHashAlgorithm` concept, if they (meaningfully) are.

Peter Dimov

10:02 a.m.

Alexander Grund wrote:

...

That is actually a good point: Why are those requirements on the HashAlgorithm concept at all?

The library is (mostly) about allowing to hash types, i.e. the framework surrounding `hash_append`. I understand that a specific way (currently being discussed here on the list) of construction is required for implementing `hash_append` for unordered containers.

But besides that, why does the library require specific constructors at all?

Simplified, this asks "if we ignore the use cases that require seeding, why does the library require that the hash algorithms support seeding?"

...

For user convenience I can see `SeedableHashAlgorithm` as a subset of `HashAlgorithm` as a useful, additional concept to include in the library.

I've already answered this a few times. It's more ergonomic if all hash algorithms support a consistent interface. This allows writing generic code that can work with any algorithm. It allows switching from one hash algorithm to any other without breakage. It also allows switching from unseeded use to seeded use, or from use with uint64_t seed to use with a byte sequence seed, without breakage regardless of the chosen hash algorithm. The idea here is to make switching from a less secure use to a more secure use easy and painless.

Peter Dimov

10:22 a.m.

Samuel Neves wrote:

...

One of the things that puzzles me about this library is its intended purpose. "Types don't know #" was clearly about hash types for use in hash tables, which explained why keying the hashes was mandatory. But it is unclear to me why MD5, SHA-1, and other cryptographic hashes are present here; they are way too inefficient to be of much use in hash tables, so there is little reason to include them. If, on the other hand, this is meant as a more general hashing library, I have strong objections as detailed below.

There exist intermediate use cases between "key in a hash table" and "cryptographically secure hashing of a binary blob" that are also served by the library. Some of them are: - if you have a program with some program state S, represented by a C++ object, which the user can save to disk by means of e.g. Ctrl+S, you often want to know whether the current state has been saved or not (to display an indicator, or to decide whether to autosave on exit.) This is implemented by keeping a hash of the last save, and comparing the current hash to the last saved one. - if you have something like Boost.Compute, which needs to compile shaders, you can keep a cache of already compiled shader binaries, along with the hashes of the source used to create them, and then skip the compilation if the source matches a hash of an already compiled binary. https://github.com/boostorg/compute/blob/cf7907574d6159cd43e6cf687a7b656278c... - if you need to send complex C++ objects over the network, you can first ask the remote endpoint "do you already have the object with this hash?" and if so, skip sending the object. And there are many other cases isomorphic to the ones above. In all of these cases, you want a very good quality hash function, one for which you can ideally assume that collisions never happen, for some practical definition of never (less than cosmic rays or random RAM bit flips, for instance). And because of that, you want to be able to use a cryptographic, or ex-cryptographic, hash function, even when there is no external attacker, because non-cryptographic hash functions are generally optimized for speed and do not produce a sufficiently long digest size. (But if a non-cryptographic hash works for you, you also want to be able to use _that_, because more speed is always better than less speed, other things being equal.) In short, having a reliable framework for hashing arbitrary C++ objects with any user-selected hash algorithm opens up a lot of possibilities that one might not realize are there, and "how do I implement std::hash for my type" is only a small fraction of them.

Ruben Perez

2:31 p.m.

On Fri, 13 Dec 2024, 11:22 Peter Dimov via Boost, <boost@lists.boost.org> wrote:

...

...
One of the things that puzzles me about this library is its intended

Samuel Neves wrote: purpose.

...
"Types don't know #" was clearly about hash types for use in hash tables, which explained why keying the hashes was mandatory. But it is unclear to me why MD5, SHA-1, and other cryptographic hashes are present here; they are way too inefficient to be of much use in hash tables, so there is little reason to include them. If, on the other hand, this is meant as a more general hashing library, I have strong objections as detailed below.

There exist intermediate use cases between "key in a hash table" and "cryptographically secure hashing of a binary blob" that are also served by the library. Some of them are:

- if you have a program with some program state S, represented by a C++ object, which the user can save to disk by means of e.g. Ctrl+S, you often want to know whether the current state has been saved or not (to display an indicator, or to decide whether to autosave on exit.) This is implemented by keeping a hash of the last save, and comparing the current hash to the last saved one.

- if you have something like Boost.Compute, which needs to compile shaders, you can keep a cache of already compiled shader binaries, along with the hashes of the source used to create them, and then skip the compilation if the source matches a hash of an already compiled binary.

https://github.com/boostorg/compute/blob/cf7907574d6159cd43e6cf687a7b656278c...

- if you need to send complex C++ objects over the network, you can first ask the remote endpoint "do you already have the object with this hash?" and if so, skip sending the object.

These three have been pretty clarifying for me regarding some design decisions in the library, thanks for providing them I'd encourage you to mention them in the docs. Thanks, Ruben.

Matt Borland

5:29 p.m.

All, I received the following review directly from Ivan Matek, because the review is written as a PDF that exceeds the allowable size of the ML. The document can be accessed here: https://drive.proton.me/urls/A6NCZVTK58#yuGnqqnzYlhA Matt

Vinnie Falco

5:54 p.m.

On Fri, Dec 13, 2024 at 9:29 AM Matt Borland via Boost < boost@lists.boost.org> wrote:

...

I received the following review directly from Ivan Matek, because the review is written as a PDF that exceeds the allowable size of the ML. The document can be accessed here: https://drive.proton.me/urls/A6NCZVTK58#yuGnqqnzYlhA

Please post this at a minimum as an HTML email. Preferably reformat to be text-only. Our future plan for the new boost.org is that we will collect and index reviews in the site database and cross-reference them from library pages. Reviews submitted as nicely formatted PDFs unfortunately will not be picked up by this. It would be best if all reviews were posted to the mailing list so they can be accessed by tooling. Thanks

Matt Borland

6:20 p.m.

...

...
I received the following review directly from Ivan Matek, because the review is written as a PDF that exceeds the allowable size of the ML. The document can be accessed here: https://drive.proton.me/urls/A6NCZVTK58#yuGnqqnzYlhA

...

Please post this at a minimum as an HTML email. Preferably reformat to be text-only.

...

Our future plan for the new boost.org is that we will collect and index reviews in the site database and cross-reference them from library pages. Reviews submitted as nicely formatted PDFs unfortunately will not be picked up by this. It would be best if all reviews were posted to the mailing list so they can be accessed by tooling.

Below is my attempt at converting this to text for posterity. Boost.Hash2 Review - Ivan Matek December 2024 About my Preferences/Experience/Expectations I am not a Boost developer. I am a longtime C++ developer. I know of issues with std::hash, not familiar with Boost.ContainerHash. I am pretty sure I used hash_combine years ago, but not sure. I prefer modern C++, and dislike features that are easy to use wrong. I do not like WG21 ABI/keyword(e.g. co_await) decisions as I feel they are ruining the language. In terms of safety vs performance my opinion is that it depends. In general I feel underlying code can use unsafe constructs as those components are much more used/tested than user facing API, when it comes to user code I prefer safer, harder to misuse constructs, unless we are talking low latency or HPC domains. When first hearing the name Hash2 I originally expected that the library would help many C++ developers that are frustrated with the fact C++23 std does not know how to hash std::pair or std::vector. That is still the main use case I am considering in this review, but during discussions on the mailing list and reading the documentation I have learned that it is good to also consider Hash2 as a general framework for decoupling type value representation and hashing algorithms. My knowledge of cryptography is basic. Bias I have not interacted in person with the primary author. But over years he has replied many times to my questions on the mailing list and I have noticed he is active on reddit where he often provides very good comments/answers, I often use the named argument trick I learned from his blog. Boost.Mp11 is one of my favorite Boost libraries. Beside that I know the author did a lot of work in Boost, e.g. fast hash maps and Boost.Describe. Due to all things above I consider him a world class C++ expert. With regards to the second author if I remember correctly we have never interacted. As for authors of the original old WG21 proposal I know that Howard is the author of std::chrono, provided many great answers on SO, and IIRC was author of original versions of libc++ so I also consider him a world class C++ expert. Limitations I did not follow any other discussion except the mailing list. Design I think the general C++ design of the library is great, so I will not spend much time on that, beside saying that I feel that like STL separates iterators and algorithms it separates hash algorithms and object representations. Beside that not hashing size when size of type is constant is a smart optimization. C++11 With regard to the decision to support C++11 I think new Boost libraries should not support C++11, but that is a topic that has been discussed at length, and in any case it is not a reason for library acceptance or rejection. It is possible that with C++17 or C++20 API could be nicer, but I can not judge that without actually seeing it, besides assuming that the concept for the algorithm could give nicer error messages. HashAlgorithm Concept Unfortunately I do not know enough about domain to judge it, and if it would be better that there are more than one concept. Describe With regards to dependency on Describe I feel Describe is a great library, although I have never used it in a production so this is a huge benefit, not a liability. I am very happy to see when Boost libraries are combined to produce nice code. I understand that people are worried for paying compilation of includes that they do not use, but I think benefits outweigh this unfortunate downside. Variant I do not understand why variant is not supported. There are three main areas where I think there is opportunity for improvements. Easy to use, hard to misuse wrappers for users More advanced avoidance of hashing size Cryptographic guarantees Easy to Use Hard to Misuse It is possible that I am misunderstanding the reasons for the signature of tag_invoke function, but it is not clear to me why current: template <class Hash, class Flavor> friend void tag_invoke(boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } can not be template <class HashAppender> friend void tag_invoke(boost::hash2::hash_append_tag const&, HashAppender& ha, X const& v) { ha.hash_append(v.a); ha.hash_append(v.b); } (Flavor is encoded in type HashAppender). We remove from the user the ability to by mistake call update, result, pass wrong flavor(by passing {}). We also remove from them the ability to copy construct a local copy of h, but it is not clear to me if use cases for this would be common. Notice that this would remove access to update from most users. I mention this because there was a lot of discussion about span vs char*, len. In my opinion we can have the cake and eat most of it with wrapper helpers. Similar example is when it comes to mailing list suggestions to replace result() with finalize() and make further calls UB. I have no strong opinion here since I do not understand how realistic it is to repeatedly call result(), benefits it provides, if it is useful why (according to the mailing list) no other library supports that use so for sure here we are discussing more than just a naming and those design discussions are much more important than addition of simple wrapper… But I think this is another opportunity for a wrapper that uses result() internally, but only exposes finalize() to the user. Here wrapper may have small potential overhead compared to finalize() design since result() mutates the state while finalize() could rely on further calls being UB and not update state. It is possible that an optimizer might realize this even when wrapper is used, but I would guess it is too much to expect from the optimizer. Another case mentioned in the mailing list is hashing char* that is probably not what the user wants since it hashes pointer, not pointed to string. I know library must allow it, e.g. how would user cast char* in std::tuple<double, char*> in his code to integral value, sometimes raw pointer is key in a map… but it could be made explicit longer name function user must explicitly call, while common case hash_append does not allow pointer values. Another opportunity for wrappers is to have different wrappers when the user is concatenating bytes of a byte stream versus when the user is concatenating values. Here the helper makes the choice if to append size or not for types like std::span<char> or vector<uint8_t>, i.e. ranges of bytes clearer since it is in the helper name. Simple examples: simple_hash_values<fnv1a_32> h; std::string x{"ab"}; std::string y{"c"}; h.append(x); h.append(y); will produce different result than if we did simple_hash_values<fnv1a_32> h; simple_hash_values<fnv1a_32> h; std::string x{"a"}; std::string y{"bc"}; h.append(x); h.append(y); , but this two cases produce same result: simple_hash_bytes<fnv1a_32> h; std::string x{"ab"}; std::string y{"c"}; char z[2] = {'d', 'e'}; h.append(x); h.append(y); h.append(z); simple_hash_bytes<fnv1a_32> h; std::string_view x{"abcd"}; std::array y = {'e'}; h.append(x); h.append(y); It is not a requirement of a framework to produce wrappers, e.g. Boost.ASIO does not have to provide websocket code in it, it is fine for Beast to do it. But I wonder if in this case the potential for bugs is so large that it would be beneficial to add helpers to the “core” library. More Advanced Avoidance of Hashing Size Currently library smartly avoids hashing size for types whose representation size is fixed, e.g. int, std::array<char, 10>, etc… but it encodes size for variable representation size types, e.g. std::string_view, std::vector<float>. This is correct in general case since to know object ranges in serialization stream we can afford at most 1 object with variable size, but we do not know what objects future calls to append may hash, so the only safe general solution is to always hash size. But there is a specific case when we can avoid hashing the size of variable representation size object. When we know it is the only object of variable representation size we append between construction and destruction of the hasher. This for example happens when we hash the std::string key in unordered_map. We do not need to encode its size since it is the only object used to produce the hash. Here the library could provide a helper that understands this case, and as a bonus as in case of previous wrappers prevents wrong use(i.e. result() is called by wrapper, not user). This could be extended for more complex types, e.g. std::pair<int, std::string> since here we only have 1 type with variable representation size, but I presume the most common case is when the type used is a simple type. Experiments with performance gains regarding this can be seen in the Performance section. In any case this is not a big issue because users can write this kind of wrappers for their use case relatively easily assuming they avoid some mistakes such as not appending anything in case of empty string. Cryptographic Guarantees This is the biggest concern I have about the library. As I am writing this there is currently a discussion on the mailing list suggesting hashing unordered ranges degrades cryptographic properties of used hash function. I know it is unlikely that this will matter for most users but as far as I understand currently that is not mentioned in documentation or in source code. I think it should be explicitly mentioned in both places. Just to make this point clear: the user does not even have to explicitly call this function, due to the dispatch rules of hash_append. This may sound too nitpicky, but I really believe this is a serious issue, for example of simple decisions that can have a big impact you can check this $160 million github issue. Performance I have tried to benchmark hashing performance in a few scenarios. All done with Clang 19.1 on Linux with libstdc++. Describe Overhead I was curious if the performance of the described class(class that uses Boost.Describe to achieve reflection) is the same as when I manually call hash_append for each member. This was initially not the case, but the author told me it is probably due to Clang not inlining, when I increased Clang inline “budget” performance difference disappeared, so this is great proof that in itself Describe is zero overhead. FNV1a for Short String I was curious if not encoding size would speed up hashing in cases when we hash just 1 value and then call result, e.g. what happens in insert/find of hash map. I focused on FNV1a variants since they were the fastest. In some cases hashing was so fast I had to check online with some tool that produced results that were correct since Clang unrolled loops so hard that performance was insanely fast. Not much to say here except that this confirms that for this case library abstractions do not use any techniques that might make Clang optimizer fail. But back to the general goal of determining if encoding size matters or not… I tried it with some collection of short strings and the performance difference exists, but it is a bit complicated. For std::string performance difference was almost 2x because clang gave up on inlining in case we were hashing size. For boost_static_string performance difference was noticeable, but tiny, obviously depending on length of string. For very short boost_static_strings hashing without size was 15% faster, but as size increased the difference remained the same in absolute time, so the relative performance difference was dropping. To keep this short I will not repeat the details here, if you want to know more you can search for "What is 7 bytes among friends" in the mailing list(to satisfy the need to hash something we always hash dummy byte if we do not hash size). While this is micro optimization I believe there is a design space to exploit it, discussed in detail in the Design part of the document. Comparing to SIMD SHA256 As I am not a Rust expert so I thought it would be nice exercise to check how fast/slow is a simple program written in Rust using what I quickly googled to be default SHA2 implementation(point was to simulate quick research and coding, instead of doing long evaluation of alternatives). As Hash2 does not use SIMD I expected Rust to be multiple times faster and it turned out to be the case. On my 13700H machine this program runs 6x faster than Boost.Hash2 equivalent. fn main() { const NUM_INTS: usize = 42_000_000; const VALUE: u32 = 42; let data: Vec<u32> = vec![VALUE; NUM_INTS]; let byte_data = unsafe { std::slice::from_raw_parts(data.as_ptr() as *const u8, data.len() * std::mem::size_of::<u32>()) }; let start = Instant::now(); let mut hasher = Sha256::new(); hasher.update(byte_data); let hash_result = hasher.finalize(); let duration = start.elapsed(); println!("{} microseconds", duration.as_millis()); println!("{:?}", hash_result.as_slice()); } I picked SHA256 since I know it benefits from SIMD instructions, did not check other algorithms. Documentation. I find documentation thorough and very well written, but hard to read for non experts, partly because the problem solved is non trivial, fundamental differences from std::hash and partly because the library does not expose easy to use wrappers. Few examples follow. What is the relationship of this library with Boost.ContainerHash? If I am C++ beginner how do I decide which one I should use for my usecase? I know in general it is not nice to claim one library is better than another, but the primary author of Hash2 is maintainer of Boost.ContainerHash and Hash2 library depends on it so I think it will not be a problem. Biggest pitfall that users need to be aware is that passing ranges that do not have constant size, will encode their size. This is obvious to me in retrospect, but I never had to think about this in the old world of std::hash because std::hash does not know how to hash more than one value. In particular I believe users need to be told explicitly about following: hash_append and hash_append_range are not just the same logic with different arguments, as for example std::ranges::find(v.begin(), v.end(), 74656) and std::ranges::find(v, 74656). They encode different information, then as an example explain that if you want to hash chunks of byte stream using std::span<char> you can not call hash_append because it will encode size of each chunk. You must call hash_append_range. Currently above is possible to understand if you read the documentation carefully, but I do not feel average C++ developer will be able to read hash_append dispatch rules(that make sense, but are very long) and figure out what is happening when he passes std::span, so I think we need few more // WRONG! code snippets in documentation and mention of this in the introduction. I do think it is great to link the proposal in the documentation, but I would move it from introduction since I believe for most users reading proposals is too difficult. Regarding unordered_map hash example I understand that hash maps are not the focus of the documentation, but I presume many users will be desperate to get a performant transparent comparator for stringy types, so I think it would be good to add an example with transparent hasher and comparator that handles std::string, std::string_view, char* in a performant way. For example of how difficult it is to do this in C++20 you can see reddit example(make sure to read about ABI in comments). Regarding other examples I feel that using nonRAII fopen/fclose and char* will scare away some users. I understand the example is about Hash2, not modern C++ or Boost.Scope, but I feel this is one more argument that wrapper helpers would be nice, e.g. we could wrap our char array and read size in span. An example with Mp11 is amazing because it replaces the switch needed if implemented in a simple way, but here again I wonder if most users will be confused and not understand it. Potential fix would be to use simple switch code in documentation and link to more advanced examples in the examples folder(if documentation has a way of detecting dead links so that it does not become a liability). Performance I believe it should be mentioned that algorithms are not implemented using SIMD instruction. I fully understand this is not the task of a hashing framework, but I still believe users will be unhappy if they find out after using the library for months that their sha256 is 5x slower than it can be. For the average user explanation that Hash2 is a hashing framework, not a collection of SIMD algorithms for different architectures will probably not mean much. They will just remember that Boost is slow, potentially sharing that opinion with coworkers and/or social media. So I think mentioning that no SIMD/AVX2 implementations are used in docs is very important. General Notes About Review Process I personally found the review process quite interesting, although it can be a huge time sink if you follow a few rabbit holes. I know it is not the primary purpose of the review but I can say I learned a lot in a few days. I noticed that very few comments about documentation were about how approachable it is to beginners(or even the average C++ developer). I would suggest that Boost considers adding this question to the list of questions during the review process. I have benefited tremendously in my evaluation from Samuel’s response. I have noticed that he did not post very often on the mailing list so I presume it was just luck or somebody pinged him as a domain expert. I wonder if this should be encouraged while announcing a review since Boost has plenty of C++ experts, but probably not so many domain experts for every field in which Boost is used. This would not mean Review Manager mailing random people he found on the internet, more of an encouragement for mailing list members to reach out to people in their circle they know might be knowledgeable about the domain and explain they do not need to review the entire library, e.g. mathematician can offer immensely valuable help reviewing some part of hypothetical graph library even if his knowledge of C++ is basic. I think we should update the policy page not mention modem :) tl;dr What is your evaluation of the design? I think the C++ design of the library is great, although the average C++ developer would benefit from an easier to use higher level interface. Can not judge cryptographic design. What is your evaluation of the implementation? Did not spent a lot of time looking into it, everything I saw looks fine from C++ perspective, can not judge wrt cryptographic correctness so I defensively assume it is not suitable for cryptographic purposes. This may sound unfair, because it is, but as I have no skills to assess the library in this regard I must assume the worst. What is your evaluation of the documentation? Detailed, well written, but not for beginners(too many hard concepts like type traits, too few examples of what not to do) What is your evaluation of the potential usefulness of the library? Do you already use it in industry? I think the library is currently useful for algorithms that do not benefit from SIMD implementations. But I feel that in most companies using it would require wrapping it in higher level wrappers that make misuse harder. I have not used it in production, I just built a few simple benchmarks and simple wrappers with it. I would never use or suggesting using library for something that interacts with the “outside world” since I am quite careful and I think any potential issues are much more likely than in something popular like OpenSSL, not because OpenSSL is so great, but because it is much more popular/scrutinized. Did you try to use the library? With which compiler(s)? Did you have any problems? I used Clang 19.1, C++23. Problems were just me wrongly assuming how spans of chars are “concatenated” when hashed, that doubles are trivially hashable, etc. no problems related to library itself. How much effort did you put into your evaluation? I spent a few days playing around with the library, but a lot of this time was spent reading up on the domain, figuring out how to enable clang-tidy warnings etc., not a classic design or code review domain expert would do. Are you knowledgeable about the problem domain? I know a fair bit about C++, the same can not be said about cryptography. My evaluation is CONDITIONAL ACCEPT with following criteria: More beginner friendly documentation, no need to turn it into a tutorial for C++, but few specific common pitfalls need to be addressed. Hashing “ab”, “c” and expecting it is same as hashing “a”, “bc” Hashing char* when user wanted to hash string Expecting hash_append and hash_append_range to produce same result Beginner friendly comparison with Container.Hash to help beginners navigate libraries. If comparing libraries is against Boost policy my apologies for this suggestion. Explicit mention in documentation that hash_append is not preserving cryptographic properties of used algorithm in certain conditions(this is assuming current implementation for hashing of unordered ranges) Explicit mention in documentation that currently algorithms are not implemented using SIMD.

Darryl Green

14 Dec 14 Dec

4:36 a.m.

There has been a lot of discussion, in hash2 review, about non-idempotent result() / every hash2 hash algorithm is required to be an extensible hash function. I have questions: Isn't any non-idempotent result function surprising? As I said in my review, hash2 imposing the extensibility requirement on "raw" hash algorithms that are not XOFs (you can always do "something" to extend it but doing "something" does not always make a good extensible hash function) is surprising. Ideally hash2 should allow an XOF to be used but not by requiring every hash to pretend to be one. I did say some documentation would help but I feel that relying on docs to make it hard to "do the wrong thing" or even possible to do what should be a simple thing (use an existing hash algorithm/algorithm implementation) without that in itself being a silly mistake (NOT having result in some way stir the pot is a bug, now, unless one considers extension by repetition to be "good enough" - which absurd as it sounds, maybe it is - it's at least "honest"). What is the specification for a "hash2 XOF"? Does the author really mean to take on the role of specifying the properties required? If FIPS says that a function is an XOF it probably is. If FIPS doesn't say that, but Peter says it is one (it must be to be a "hash2 hash algorithm")... Is that ok? Who wins?

Peter Dimov

4 p.m.

Darryl Green wrote:

...

There has been a lot of discussion, in hash2 review, about non-idempotent result() / every hash2 hash algorithm is required to be an extensible hash function. I have questions:

Isn't any non-idempotent result function surprising?

I wouldn't say so. If you have a thing that exposes three operations, construct, update, result, your expectation should be that it supports the following sequence construct update update ... update result and any deviations put you into "read the documentation" land. Adding a second `result` call to the end can easily have one of the following effects: - return an unspecified value - return the same value as the previous call (idempotence) - fail (either by returning an error or throwing) - undefined behavior There's no a priori reason to expect idempotence here, unless the specification guarantees it, and if you look at the opinions expressed so far in this review, you'll find people who argue for each of the above being their preference, presumably the one who they wouldn't be surprised by. In addition, if you now add update update result to the end of that sequence, you have some further choices to make. I suspect that we'll also have a diversity of opinions here as to what the unsurprising effect of the above should be; my expectation would be for a person who considers idempotence the only unsurprising alternative to prefer the clean semantics of `result` always returning the hash of the partial message accumulated so far by the preceding sequence of update calls. But of course that's not at all undisputed, partly because it imposes a cost. As currently specified, you can obtain these semantics by doing the following: auto get_partial_result( Hash const& hash ) { return Hash( hash ).result(); } which is why I consider them derived semantics, and not a fundamental operation that the algorithm needs to provide. They are achievable from the outside, with no loss of functionality, and with the costs paid by the user who needs them.

...

As I said in my review, hash2 imposing the extensibility requirement on "raw" hash algorithms that are not XOFs (you can always do "something" to extend it but doing "something" does not always make a good extensible hash function) is surprising. Ideally hash2 should allow an XOF to be used but not by requiring every hash to pretend to be one. I did say some documentation would help but I feel that relying on docs to make it hard to "do the wrong thing" or even possible to do what should be a simple thing (use an existing hash algorithm/algorithm implementation) without that in itself being a silly mistake (NOT having result in some way stir the pot is a bug, now, unless one considers extension by repetition to be "good enough" - which absurd as it sounds, maybe it is - it's at least "honest").

In short, the argument here is that we shouldn't require from hash algorithm authors to provide result extension, because they might screw it up. That's a legitimate argument, in principle; if result extension was something uniquely hard to provide, or something particularly error-prone (measured against the baseline of implementing the rest of the hash algorithm), I would probably have agreed. But that's not the case. In fact, most of the time, result extension is free, in that if you just implement `result` in an entirely straightforward manner per the algorithm spec, you automatically get result extension. It's easier to implement it accidentally, than make a mistake. And it's not true that the rest of the hash algorithm is easier to implement, or less error prone. Seed construction and update are at least as tricky, if not more. In addition, the exact same argument applies against seeded construction; people can easily make mistakes when implementing it. But I consider this an acceptable cost to pay, because the functionality is necessary for the intended use of the hash algorithms, by the library and by the users. Why is result extension necessary? Well, consider the std::hash use case, where the wrapper object does this: std::size_t operator()( T const& v ) const { H h; boost::hash2::hash_append( h, {}, v ); return boost::hash2::get_integral_result<std::size_t>( h.result() ); } (ignoring seeding at the moment) If H::result_type is uint32_t, but std::size_t is 64 bit, get_integral_result needs to extend the uint32_t value into 64 bits. It currently does the equivalent of return h.result() * 0x7FFFFFFF'7FFFFFFF; but I'm planning to change it to take advantage of result extension and do this instead: return (uint64_t)h.result << 32 | h.result(); There is no scenario in which the former produces better 64 bit hash values than the latter, unless, of course, the implementation of result extension is majorly screwed up. For another example, consider hash_append_unordered_range, which currently does w += hash2::get_integral_result<std::uint64_t>( h2.result() ); As we already established, a 64 bit accumulator w isn't going to cut it; so I'm planning to change w from std::uint64_t w = 0; to std::uint64_t w[ 4 ] = {}; This would require the ability to obtain 256 bits of hash from the hash algorithm. Again, there is no way to do this from the outside, without support from the hash algorithm, and achieve a better result distribution. In the general case, when you need to do nontrivial things with the hash algorithm that aren't just a simple sequence of `update` calls, result extension is simply necessary. I suppose I can, in principle, implement the above use cases without result extension, by using an MGF-like scheme. E.g. instead of return (uint64_t)h.result << 32 | h.result(); do this H h1( h ); h1.update( '\x01', 1 ); auto r1 = h1.result(); H h2( h ); h2.update( '\x02', 1 ); auto r2 = h2.result(); return (uint64_t)r1 << 32 | r2; but this imposes completely unnecessary performance costs.

...

What is the specification for a "hash2 XOF"? Does the author really mean to take on the role of specifying the properties required? If FIPS says that a function is an XOF it probably is. If FIPS doesn't say that, but Peter says it is one (it must be to be a "hash2 hash algorithm")... Is that ok? Who wins?

If FIPS doesn't say that a hash function is an XOF, it isn't. Result extension doesn't automatically turn any hash function into an XOF, similarly to how byte sequence seeding doesn't turn any hash function into a MAC. You can use the function as XOF or MAC, but this doesn't make it meet the stricter cryptographic requirements for XOF or MAC. It's probably my own fault for not making that point in the documentation. I did make an explicit note in the constructor description that even though it makes all algorithms usable as MACs, an established MAC algorithm such as HMAC should be used unless the algorithm has been specifically designed to be used as a MAC (as SipHash was.) But that's not what I said when describing `result`, and I should have. To sum up with a general point, the hash algorithm requirements as specified reflect (my idea of) what a modern hash algorithm (one designed today) would provide as an API. If we look at SipHash, which is relatively recent, we'll see that it's a very good match for these requirements; the only slight issue is that it doesn't support seed sequences of arbitrary length (which is a regression in usability compared to HMAC.) In particular, it supports byte sequence seeding, doesn't expose its entire internal state as the digest, and trivially supports result extension. There is actually a SipHash variant that produces a 128 bit digest, instead of 64 bit one, and it's almost exactly equivalent to a result-extended hash2::siphash_64. The only difference is that some xor constants have been changed so that the low 64 bits of the 128 bit variant do not match the value produced by the 64 bit variant. Thanks to everyone who read this far. We should gamify the list and have some sort of achievement be unlocked by that.

Ruben Perez

4:39 p.m.

...

Thanks to everyone who read this far. We should gamify the list and have some sort of achievement be unlocked by that.

One of the things I hate about MLs is that I can't tap "like" on messages like this.

Matt Borland

4:14 p.m.

...

Dear all,

...

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

...

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

...

Please provide feedback on the following general topics:

...

- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

...

There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

...

Matt Borland Review Manager and C++Alliance Staff Engineer

All, Reminder that the review for Hash2 ends tomorrow (15 December) night. If you have not already submitted a review of the library please consider doing so. Matt

Дмитрий Архипов

15 Dec 15 Dec

4:30 p.m.

сб, 14 дек. 2024 г. в 19:15, Matt Borland via Boost <boost@lists.boost.org>:

...

...
The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

This is my review of the proposed Hash2 library. Before the review I want to state that I am affiliated with the C++ Alliance.

...

- What is your evaluation of the design?

I read the paper on which the library is based on years ago and was very fond of the design outlined in it. It's great to see it finally realised. The library has a few innovations with regards to the original paper and after reading all of the review I consider those innovations useful.

...

- What is your evaluation of the implementation?

I only glanced at the implementation.

...

- What is your evaluation of the documentation?

I think, the documentation should be much more explicit about the implications of the design choices it made. After reading other people's comments and reviews it is clear to me that many people have not considered that the implications of the requirements of the HashAlgorithm concept are deliberate and not a mistake. Also, it will probably be good to explicitly state that no vetting by crypto experts or certifications by relevant agencies was done for any of the hashers provided by the library.

...

- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?

I think that the library achieves what it tried to achieve (see below). And I consider that very useful.

...

- Did you try to use the library? With which compiler(s)? Did you have any problems?

In order to review Hash2 I experimented with integrating it into Boost.JSON. Hashing is used in 2 places in Boost.JSON. One of the uses is boost:: and std::hash support for JSON containers. Rewriting the existing code with Hash2 was very straightforward, as it is extremely similar to the existing usage of ContainerHash. The benefit of using Hash2 is exactly the one outlined in the original paper: ease of switching hashing algorithms. The second use of hashing in Boost.JSON is hashing of object keys. Rewriting the existing hashing function to become a HashAlgorithm was very straightforward (although, this could be because our algorithm is so simple). After that I experimented with changing the hashing algorithm with several of the ones provided by Hash2. This was extremely easy: changing an include line and changing the hash object type. This ability to trivially change hashing algorithms is as far as I can tell the main objective of the library. That is the reason for the requirements of HashAlgorithm. While those requirements may exclude many of the algorithms provided by the existing libraries today, I hope that maintainers of those libraries will see the utility of these additional operations and will provide them.

...

- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

With reading documentation, all reviews and comments, and trying the library out, I've spent around 5 hours.

...

- Are you knowledgeable about the problem domain?

I'm not an expert on either hashing or cryptography. I suggest we ACCEPT the library into Boost.

Peter Dimov

4:42 p.m.

Дмитрий Архипов wrote:

...

This ability to trivially change hashing algorithms is as far as I can tell the main objective of the library. That is the reason for the requirements of HashAlgorithm.

That is exactly correct. Thanks Dmitry.

Matt Borland

5:13 p.m.

...

This is my review of the proposed Hash2 library. Before the review I want to state that I am affiliated with the C++ Alliance.

...

I suggest we ACCEPT the library into Boost.

Thank you Dmitry for the review. Matt

Matt Borland

5:15 p.m.

...

Dear all,

...

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

...

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

...

Please provide feedback on the following general topics:

...

- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

...

There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

...

Matt Borland Review Manager and C++Alliance Staff Engineer

All, This is the final reminder that the review period for Hash2 ends tonight. Matt

Alan de Freitas

7:24 p.m.

Here's my review. I want to thank Peter Dimov and Christian Mazakas for their work.

...

Does this library bring real benefit to C++ developers for real world use-case?

Yes. Improving the state of hash functions for unordered containers is a worthy goal, as is easily changing the hash function being used. Choosing a specific hash algorithm or plugin external hash algorithms is very useful. I like the choice of hash algorithms. The list can be expanded if people clamor for a new algorithm.

...

Do you have an application for this library?

Yes. We always use unordered containers, which is the primary use case here.

...

Does the API match with current best practices?

Yes. It's a significant improvement over "Boost.Hash1". Maybe it could just be "Boost.Hash" since there's no "Boost.Hash1" and this library is not only about containers. I was initially concerned that `result()` was not idempotent, but Peter's arguments made sense to me. The authors could change the name to some verb, but that's up to them. It is the same with the provided hashers being as cryptographically safe as they could be, but that seems like an unfair criterion for a library focused on containers and aiming to be an improvement over "Boost.Hash1". However, in both cases, expectations have changed over time. Big admonitions in the documentation are worth it. hash_append_sized_range could become something like hash_append_range_and_size, to avoid confusion with std::ranges::sized_range.

...

Is the documentation helpful and clear?

Yes. The documentation is very complete and precise. It's driven by goals and examples, which is excellent. Since the implementations are not supposed to be the fastest, the documentation could include some benchmarks. This could be the first example since unordered containers are the most common use case. Then some content about wrapping external hashing algorithms could be included, since the library is about generalizing that. I wouldn't say I like single-page documentation very much, but it does the job.

...

Did you try to use it? What problems or surprises did you encounter?

Yes. I compiled the code and ran the examples.

...

What is your evaluation of the implementation?

The code is very well organized and adequately documented. I hope the authors can figure out an implementation for hash_append_unordered_range that doesn't lose entropy with cryptographically safe hash functions. Or maybe they could provide alternatives and document their trade-offs. Or there could be some hash_aggregator concept to let it be customized.

...

Are you knowledgeable about the problem domain?

Yes. I have been using various hash functions in multiple applications for a long time.

...

Please explicitly state that you either *accept* or *reject* the inclusion of this library into boost.

...

Also please indicate the time & effort spent on the evaluation and give

I recommend accepting the library. the reasons for your decision. I read the documentation and the source code. I compiled the library and ran the examples. I spent time on and off over the last week evaluating the library. In total, I spent about two days evaluating it.

...

Disclaimer

I should also mention my affiliation with the C++ Alliance as a Staff Engineer. Em dom., 15 de dez. de 2024 às 14:15, Matt Borland via Boost < boost@lists.boost.org> escreveu:

...

...
Dear all,

...
The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

...
Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

...
Please provide feedback on the following general topics:

...
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

...
Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

...
There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

...
Matt Borland Review Manager and C++Alliance Staff Engineer

All,

This is the final reminder that the review period for Hash2 ends tonight.

Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Alan Freitas https://alandefreitas.github.io/alandefreitas/ <https://github.com/alandefreitas>

Matt Borland

16 Dec 16 Dec

2:33 p.m.

...

Here's my review....

...

I recommend accepting the library.

Thank you Alan for the review. Matt

Matt Borland

2:35 p.m.

...

Dear all,

...

The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.

...

Code: https://github.com/pdimov/hash2 Docs: https://pdimov.github.io/hash2/

...

Please provide feedback on the following general topics:

...

- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? Do you already use it in industry? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?

...

Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

...

There's already been a flurry of conversation so I thank you all for that. If you have any questions during this process feel free to reach out to me. Additionally, if you would like to submit your review privately, which I will anonymize for the review report, you may send it directly to me at: matt@mattborland.com.

...

Matt Borland Review Manager and C++Alliance Staff Engineer

Dear all, The review period for Hash2 is now closed. Thank you to everyone who participated by writing a review, and generating lively discussion. I will publish the report and results in short order. Matt

211

Age (days ago)

220

Last active (days ago)

List overview

Download

94 comments

20 participants

participants (20)

Alan de Freitas
Alexander Grund
Andrey Semashev
Christian Mazakas
Christopher Kormanyos
Darryl Green
Ion Gaztañaga
Ivan Matek
Joaquín M López Muñoz
Klemens Morgenstern
Matt Borland
Peter Dimov
Peter Turcan
Richard Hodges
Ruben Perez
Samuel Neves
Tom Kent
Vinnie Falco
Zach Laine
Дмитрий Архипов