
- What is your evaluation of the design?
The design is good, but there is a major issue with security/safety in
default constructors of the hash algorithms.
## HashDOS issue
The issue is related to HashDOS attacks. The default seed must not be 0.
Because with 0 the attacker could easily find
strings of the same size that result in the same trailing bits of the hash
result. As a result on those strings all
the operations with unordered containers degrade to O(N) for each string,
giving O(N*N) complexity for a set of strings.
The issue affects any parsing where the result is stored into an unordered
container: HTTP headers, HTTP cookies, JSON keys, XML...
Our in-house experiments show that a single laptop can easily put down a
few hundreds of servers just by sending the
same precomputed HTTP request. The servers start to use 100% CPU and do not
respond to other requests.
The problem is so critical, that for some time we write HasDOS tests to
make sure that our data structures are not
affected:
https://github.com/userver-framework/userver/blob/develop/universal/src/http...
With some math knowledge the attack becomes even more simple see
https://orlp.net/blog/breaking-hash-functions/
Quick fix for the HashDOS is to generate a random number at the program
start use it to initialize all the hashes on construction.
Is still leaves a theoretical possibility of attack if there's only one
instance of the application or the attacker
generates strings set for each instance and sends them in one go. We were
not patient enough to generate the exploit,
which does not actually mean that it is not possible.
A more mature fix is to generate a random seed for each request or
container. We were using that approach for quite some
time
https://github.com/userver-framework/userver/blob/d09b427b42fc80dcd66b8161db...
.
We also checked some other popular frameworks for the HasDOS issue. Too
many of them were affected (we reported to the authors,
but not all of them reacted to the problem).
All of the above leads to the following conclusion: developers are not
aware of that critical issue or do not take the issue seriously.
To not provoke vulnerabilities the hashing must be safe by default.
## tag_invoke is dreadful
tag_invoke is over complicated. It may make sense if there is a need to
specialize the behavior not just for the user
defined type, but also to specialize behavior of the user defined type
depending on the hashing algorithm and/or flavor.
Such functionality does not seem to be required here.
So instead of
```
template
- What is your evaluation of the implementation?
Very good.
- What is your evaluation of the documentation?
Good.
- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?
Not used it. The library is very useful because it researches the constexpr hashing, proposes hash combining and attempts to solve the other std::hash issues.
- Did you try to use the library? With which compiler(s)? Did you have any problems?
Not used it.
- How much effort did you put into your evaluation?
A quick reading of sources and docs.
- Are you knowledgeable about the problem domain?
Yes, but not an expert.
Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).
CONDITIONAL ACCEPT. The behavior of the default constructor must be secure. `tag_invoke` seems to be an overkill.

Antony Polukhin wrote:
CONDITIONAL ACCEPT. The behavior of the default constructor must be secure.
Hi Antony, thanks for the review. I of course agree that HashDOS is a problem. The library documentation points this out in several places, that's why SipHash is included and almost no other non-crypto hashes are, and its use is recommended. However, I don't agree that the default constructor of the hash algorithm _concept_ is the correct place to attempt to make things secure (or in practice, less insecure than before.) First, this would place the responsibility on hash algorithm authors, which doesn't seem correct, or reliable. Second, hash algorithms aren't only used in hash tables, and injecting nondeterminism in their default constructors seems user-hostile. The hash algorithm adaptor, once it's supplied by the library, would be a better place to do that, if we decide to do it. Third, using a single process-wide seed is not good practice (as you yourself observe.) It makes things more secure compared to unseeded use, but not really secure. The correct approach is to use a random seed (preferably of size 192 bits or more) that varies per connection, or per container. One might in principle make the case that a default constructor shouldn't be included, but I think that this, combined with the other requirements that are already causing pushback, is way too much innovation for 2025.
`tag_invoke` seems to be an overkill.
Consider the second example in https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi... class Str { private: static constexpr std::size_t N = 32; std::uint8_t size_ = 0; char data_[ N ] = {}; // ... }; and let N be something bigger, say 256. Using your proposed hash_visit approach, there's no way to include the range [data_, data_ + size_) efficiently in the message; you'd have to use a per-character loop. This isn't good for performance. To fix this, you'd need to extend the visitor interface so that it allows the equivalent of hash_append_range, and then you'll arrive on something isomorphic to the current approach, e.g. visitor( hash_append_range_tag{}, data_, data_ + size_ ); In addition, this has the usual drawback of reserving the name "hash_visit" in all namespaces, which is what the tag_invoke pattern is meant to solve. ...
With some math knowledge the attack becomes even more simple see https://orlp.net/blog/breaking-hash-functions/
All non-crypto hash functions except SipHash should be assumed broken.

On Fri, Dec 13, 2024 at 8:55 AM Peter Dimov via Boost
injecting nondeterminism in their default constructors seems user-hostile ... One might in principle make the case that a default constructor shouldn't be included
It is quite natural and expected that one can adapt an existing cryptographic hash algorithm to work with types that are Hash2-enabled. The default constructor has to exist, and should not have any additional behavior beyond what is prescribed by the actual hash algorithm. For example, a default constructed SHA256 hash algorithm should have the state prescribed in rfc6234 for when the algorithm is initialized and before any message data has been provided. Seeding or deleting the default constructor works against this. Thanks

On Fri, Dec 13, 2024, 19:55 Peter Dimov via Boost
Antony Polukhin wrote:
CONDITIONAL ACCEPT. The behavior of the default constructor must be secure.
Hi Antony, thanks for the review.
I of course agree that HashDOS is a problem. The library documentation points this out in several places, that's why SipHash is included and almost no other non-crypto hashes are, and its use is recommended.
However, I don't agree that the default constructor of the hash algorithm _concept_ is the correct place to attempt to make things secure (or in practice, less insecure than before.)
First, this would place the responsibility on hash algorithm authors, which doesn't seem correct, or reliable.
Second, hash algorithms aren't only used in hash tables, and injecting nondeterminism in their default constructors seems user-hostile. The hash algorithm adaptor, once it's supplied by the library, would be a better place to do that, if we decide to do it.
After Vinnie's comment I've realised my source of confusion. Boost.Hash library is about hashing for unordered containers, but Boost.Hash2 is about hashing algorithms. Hash2 is not a replacement for Boost.Hash, it has different goals. Does it make sense to change the library name to highlight that? Boost.HashAlgorithms? Or there is a plan to go further and provide adapters for hash algorithms usage with unordered containers? Third, using a single process-wide seed is not good practice (as you
yourself observe.) It makes things more secure compared to unseeded use, but not really secure. The correct approach is to use a random seed (preferably of size 192 bits or more) that varies per connection, or per container.
Those 192 bits seem to be related to a particular hashing algorithm. Looks like it makes sense for the hashing algorithm implementor to provide some information on seed length. Is there any plan to get that information from algorithm?

Antony Polukhin wrote:
Or there is a plan to go further and provide adapters for hash algorithms usage with unordered containers?
There is, yes. Several are currently shown as examples https://pdimov.github.io/hash2/doc/html/hash2.html#example_use_with_unordere... and one will eventually be added to the library proper once we're confident it's the right one.
Third, using a single process-wide seed is not good practice (as you yourself observe.) It makes things more secure compared to unseeded use, but not really secure. The correct approach is to use a random seed (preferably of size 192 bits or more) that varies per connection, or per container.
Those 192 bits seem to be related to a particular hashing algorithm. Looks like it makes sense for the hashing algorithm implementor to provide some information on seed length. Is there any plan to get that information from algorithm?
No, the current approach is to make all hash algorithms be able to consume arbitrary seeds, so that user code doesn't need to change when the algorithm is changed. 192 bits isn't related to a particular hash algorithm, it's just the minimum amount of entropy you'd need today to be reasonably confident of security. Use 256 bits of entropy to be sure.
participants (4)
-
Antony Polukhin
-
Matt Borland
-
Peter Dimov
-
Vinnie Falco