
- What is your evaluation of the design?
The design is good, but there is a major issue with security/safety in default constructors of the hash algorithms. ## HashDOS issue The issue is related to HashDOS attacks. The default seed must not be 0. Because with 0 the attacker could easily find strings of the same size that result in the same trailing bits of the hash result. As a result on those strings all the operations with unordered containers degrade to O(N) for each string, giving O(N*N) complexity for a set of strings. The issue affects any parsing where the result is stored into an unordered container: HTTP headers, HTTP cookies, JSON keys, XML... Our in-house experiments show that a single laptop can easily put down a few hundreds of servers just by sending the same precomputed HTTP request. The servers start to use 100% CPU and do not respond to other requests. The problem is so critical, that for some time we write HasDOS tests to make sure that our data structures are not affected: https://github.com/userver-framework/userver/blob/develop/universal/src/http... With some math knowledge the attack becomes even more simple see https://orlp.net/blog/breaking-hash-functions/ Quick fix for the HashDOS is to generate a random number at the program start use it to initialize all the hashes on construction. Is still leaves a theoretical possibility of attack if there's only one instance of the application or the attacker generates strings set for each instance and sends them in one go. We were not patient enough to generate the exploit, which does not actually mean that it is not possible. A more mature fix is to generate a random seed for each request or container. We were using that approach for quite some time https://github.com/userver-framework/userver/blob/d09b427b42fc80dcd66b8161db... . We also checked some other popular frameworks for the HasDOS issue. Too many of them were affected (we reported to the authors, but not all of them reacted to the problem). All of the above leads to the following conclusion: developers are not aware of that critical issue or do not take the issue seriously. To not provoke vulnerabilities the hashing must be safe by default. ## tag_invoke is dreadful tag_invoke is over complicated. It may make sense if there is a need to specialize the behavior not just for the user defined type, but also to specialize behavior of the user defined type depending on the hashing algorithm and/or flavor. Such functionality does not seem to be required here. So instead of ``` template<class Hash, class Flavor> friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } ``` how about using a simpler approach: ``` template<class Visitor> friend constexpr void hash_visit(Visitor visitor, X const& v) { visitor(v.a); visitor(v.b); } ``` where Visitor could be ``` [&h, f](const auto& value) { boost::hash2::hash_append(h, f, value); } ``` ## std::hash and hash_value I'd appreciate functionality to reuse boost::hash2 functionality with std::hash and hash_value. It would be ideal to have something like: ``` template <class T> requires boost::hash2::supports_v<T> struct std::hash { std::size_t operator()(const T& x) const { boost::hash2::default_hasher h{boost::hash2::on_program_start_seed}; boost::hash2::hash_append(h, {}, x); return boost::hash2::get_integral_result<std::size_t>( h.result() ); } } template <class T> requires boost::hash2::supports_v<T> std::size_t hash_value(const T& x) { // put into Boost.Hash return std::hash<T>{}(x); } ``` So that the user just needs to provide boost::hash2 customization for its type, and after that it works with the standard library hashing and old boost hash.
- What is your evaluation of the implementation?
Very good.
- What is your evaluation of the documentation?
Good.
- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?
Not used it. The library is very useful because it researches the constexpr hashing, proposes hash combining and attempts to solve the other std::hash issues.
- Did you try to use the library? With which compiler(s)? Did you have any problems?
Not used it.
- How much effort did you put into your evaluation?
A quick reading of sources and docs.
- Are you knowledgeable about the problem domain?
Yes, but not an expert.
Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).
CONDITIONAL ACCEPT. The behavior of the default constructor must be secure. `tag_invoke` seems to be an overkill.

Antony Polukhin wrote:
CONDITIONAL ACCEPT. The behavior of the default constructor must be secure.
Hi Antony, thanks for the review. I of course agree that HashDOS is a problem. The library documentation points this out in several places, that's why SipHash is included and almost no other non-crypto hashes are, and its use is recommended. However, I don't agree that the default constructor of the hash algorithm _concept_ is the correct place to attempt to make things secure (or in practice, less insecure than before.) First, this would place the responsibility on hash algorithm authors, which doesn't seem correct, or reliable. Second, hash algorithms aren't only used in hash tables, and injecting nondeterminism in their default constructors seems user-hostile. The hash algorithm adaptor, once it's supplied by the library, would be a better place to do that, if we decide to do it. Third, using a single process-wide seed is not good practice (as you yourself observe.) It makes things more secure compared to unseeded use, but not really secure. The correct approach is to use a random seed (preferably of size 192 bits or more) that varies per connection, or per container. One might in principle make the case that a default constructor shouldn't be included, but I think that this, combined with the other requirements that are already causing pushback, is way too much innovation for 2025.
`tag_invoke` seems to be an overkill.
Consider the second example in https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi... class Str { private: static constexpr std::size_t N = 32; std::uint8_t size_ = 0; char data_[ N ] = {}; // ... }; and let N be something bigger, say 256. Using your proposed hash_visit approach, there's no way to include the range [data_, data_ + size_) efficiently in the message; you'd have to use a per-character loop. This isn't good for performance. To fix this, you'd need to extend the visitor interface so that it allows the equivalent of hash_append_range, and then you'll arrive on something isomorphic to the current approach, e.g. visitor( hash_append_range_tag{}, data_, data_ + size_ ); In addition, this has the usual drawback of reserving the name "hash_visit" in all namespaces, which is what the tag_invoke pattern is meant to solve. ...
With some math knowledge the attack becomes even more simple see https://orlp.net/blog/breaking-hash-functions/
All non-crypto hash functions except SipHash should be assumed broken.

On Fri, Dec 13, 2024 at 8:55 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:
injecting nondeterminism in their default constructors seems user-hostile ... One might in principle make the case that a default constructor shouldn't be included
It is quite natural and expected that one can adapt an existing cryptographic hash algorithm to work with types that are Hash2-enabled. The default constructor has to exist, and should not have any additional behavior beyond what is prescribed by the actual hash algorithm. For example, a default constructed SHA256 hash algorithm should have the state prescribed in rfc6234 for when the algorithm is initialized and before any message data has been provided. Seeding or deleting the default constructor works against this. Thanks

On Fri, Dec 13, 2024, 19:55 Peter Dimov via Boost <boost@lists.boost.org> wrote:
Antony Polukhin wrote:
CONDITIONAL ACCEPT. The behavior of the default constructor must be secure.
Hi Antony, thanks for the review.
I of course agree that HashDOS is a problem. The library documentation points this out in several places, that's why SipHash is included and almost no other non-crypto hashes are, and its use is recommended.
However, I don't agree that the default constructor of the hash algorithm _concept_ is the correct place to attempt to make things secure (or in practice, less insecure than before.)
First, this would place the responsibility on hash algorithm authors, which doesn't seem correct, or reliable.
Second, hash algorithms aren't only used in hash tables, and injecting nondeterminism in their default constructors seems user-hostile. The hash algorithm adaptor, once it's supplied by the library, would be a better place to do that, if we decide to do it.
After Vinnie's comment I've realised my source of confusion. Boost.Hash library is about hashing for unordered containers, but Boost.Hash2 is about hashing algorithms. Hash2 is not a replacement for Boost.Hash, it has different goals. Does it make sense to change the library name to highlight that? Boost.HashAlgorithms? Or there is a plan to go further and provide adapters for hash algorithms usage with unordered containers? Third, using a single process-wide seed is not good practice (as you
yourself observe.) It makes things more secure compared to unseeded use, but not really secure. The correct approach is to use a random seed (preferably of size 192 bits or more) that varies per connection, or per container.
Those 192 bits seem to be related to a particular hashing algorithm. Looks like it makes sense for the hashing algorithm implementor to provide some information on seed length. Is there any plan to get that information from algorithm?

Antony Polukhin wrote:
Or there is a plan to go further and provide adapters for hash algorithms usage with unordered containers?
There is, yes. Several are currently shown as examples https://pdimov.github.io/hash2/doc/html/hash2.html#example_use_with_unordere... and one will eventually be added to the library proper once we're confident it's the right one.
Third, using a single process-wide seed is not good practice (as you yourself observe.) It makes things more secure compared to unseeded use, but not really secure. The correct approach is to use a random seed (preferably of size 192 bits or more) that varies per connection, or per container.
Those 192 bits seem to be related to a particular hashing algorithm. Looks like it makes sense for the hashing algorithm implementor to provide some information on seed length. Is there any plan to get that information from algorithm?
No, the current approach is to make all hash algorithms be able to consume arbitrary seeds, so that user code doesn't need to change when the algorithm is changed. 192 bits isn't related to a particular hash algorithm, it's just the minimum amount of entropy you'd need today to be reasonably confident of security. Use 256 bits of entropy to be sure.
participants (4)
-
Antony Polukhin
-
Matt Borland
-
Peter Dimov
-
Vinnie Falco