Boost 1.68.0 - boost hashing changed ? - Boost-users

Boost 1.68.0 - boost hashing changed ?

Shailja Prasad

22 Oct 2018 22 Oct '18

5:53 p.m.

Hello there, I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input. boost::hash<string> namehash; size_t hashCode = namehash( name); Where name is string, an e.g “abcdefg” With boost 1.53.0 the hashCode is 168904 And with boost 1.68.0 the hashCode is 69530. I am giving just sample random number here, those two numbers are never equal, with everything remain the same, only change boost version on EL6 Linux machine with GCC 4.9 compiler. This is blocking me to upgrade to newer version of boost. I appreciate if anyone has some info on this. Thanks

Show replies by date

Miguel Ojeda

23 Oct 23 Oct

6:45 a.m.

On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users <boost-users@lists.boost.org> wrote:

...

I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.

Hm... why would you expect the hash to be always the same between releases, compilers, etc.? I cannot find it with a quick look at Boost.Hash's docs anything regarding a guarantee of that. If it is like std::hash, then it is only guaranteed to remain equal for the duration of the program. In other words, you cannot rely on saving it nor comparing them to other hashes from other vendors, platforms, architectures, compiler releases, etc. Also, taking a quick look at the repository, there were several changes between 1.53 and 1.68, e.g.: https://github.com/boostorg/container_hash/commit/bb2a91bf47354bfce7378394bc... https://github.com/boostorg/container_hash/commit/309d17f38722b7bd15b804e55d... Cheers, Miguel

degski

8:19 a.m.

On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users < boost-users@lists.boost.org> wrote:

...

On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users <boost-users@lists.boost.org> wrote:

...
I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like

hashing code generation has changed, since the following line gives two different hashcode for same string input.

Hm... why would you expect the hash to be always the same between releases, compilers, etc.?

Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this. I cannot find it with a quick look at

...

Boost.Hash's docs anything regarding a guarantee of that. If it is like std::hash, then it is only guaranteed to remain equal for the duration of the program.

Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's]. I'm not sure that is a great one and by the time we might [would like to] have constexpr std::ordered_map maybe not even tenable.

...

In other words, you cannot rely on saving it nor comparing them to other hashes from other vendors, platforms, architectures, compiler releases, etc.

In my view this is an omission, the option to have exactly that should [have been] available. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*

Miguel Ojeda

9:25 a.m.

On Tue, Oct 23, 2018 at 10:19 AM degski <degski@gmail.com> wrote:

...

On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users <boost-users@lists.boost.org> wrote:

...
On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users <boost-users@lists.boost.org> wrote:

...
I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.

Hm... why would you expect the hash to be always the same between releases, compilers, etc.?

Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.

...

...
I cannot find it with a quick look at Boost.Hash's docs anything regarding a guarantee of that. If it is like std::hash, then it is only guaranteed to remain equal for the duration of the program.

Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's].

Not sure what you mean. That is what I said.

...

...
In other words, you cannot rely on saving it nor comparing them to other hashes from other vendors, platforms, architectures, compiler releases, etc.

In my view this is an omission, the option to have exactly that should [have been] available.

Not really. You could argue, for instance, that precisely because std::hash (and Boost.Hash) is meant to be used in maps/hash tables/..., you should not be able to guess the values of the hash in advance, in order to prevent collision attacks. In other words, the implementation has even the freedom to provide a different hash function every run of your program. Not only that, but stating that the hash should remain constant across C++/Boost releases is basically stating the hash function should be fixed forever. That removes all the freedom for improvements when future hash functions are discovered or implemented, with better properties (which is what happened in the commits I linked). In summary: the hashes provided by Boost or the standard are not intended to be fixed functions; i.e. you shouldn't rely on the actual values returned, only on the properties of the function. Namely, this one: "For two different values t1 and t2, the probability that h(t1) and h(t2) compare equal should be very small, approaching 1.0 / numeric_limits<size_t>::max()." Cheers, Miguel

degski

10:36 a.m.

On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote:

...

...
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*

Miguel Ojeda

2:11 p.m.

On Tue, Oct 23, 2018 at 12:36 PM degski <degski@gmail.com> wrote:

...

On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote:

...
...
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

Indeed, that is a good point! An std implementation (and Boost.Hash too) could provide the means to fix the function for debugging (e.g. through a #define). Cheers, Miguel

Shailja Prasad

24 Oct 24 Oct

10:07 a.m.

...

On Oct 23, 2018, at 10:11 AM, Miguel Ojeda via Boost-users <boost-users@lists.boost.org> wrote:

...
On Tue, Oct 23, 2018 at 12:36 PM degski <degski@gmail.com> wrote:

...
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote:

...
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

Indeed, that is a good point! An std implementation (and Boost.Hash too) could provide the means to fix the function for debugging (e.g. through a #define).

Ok, tried few approach to get the same hashCode using boost 1.68.0 which is coming from using boost 1.53.0 ( same machine and compiler) looking into your suggested change link. Thank you, Miguel !

...

Cheers, Miguel _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users

Miguel Ojeda

4:22 p.m.

On Wed, Oct 24, 2018 at 12:07 PM Shailja Prasad <shalja.rudra@gmail.com> wrote:

...

Ok, tried few approach to get the same hashCode using boost 1.68.0 which is coming from using boost 1.53.0 ( same machine and compiler) looking into your suggested change link. Thank you, Miguel !

You're welcome Shailja! I am glad it was useful :-) Cheers, Miguel

Richard Hodges

11:13 a.m.

On Tue, 23 Oct 2018 at 12:36, degski via Boost-users < boost-users@lists.boost.org> wrote:

...

On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda < miguel.ojeda.sandonis@gmail.com> wrote:

...
...
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

It's already difficult enough to teach new programmers not to serialise the result of std/boost hash. Providing a means to ensure that it's predictable would strengthen the illusion that it's predictable across compilers and architectures. This would be a grave error. I would argue the opposite. std::hash should work hard to ensure that for any two runs of the same program, the results of a hash will be wildly different. This would make it easier to spot incorrect uses of it. R

...

degski -- *“If something cannot go on forever, it will stop" - Herbert Stein* _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users

james

4:31 p.m.

On Wed, Oct 24, 2018 at 12:14 PM Richard Hodges via Boost-users < boost-users@lists.boost.org> wrote:

...

On Tue, 23 Oct 2018 at 12:36, degski via Boost-users < boost-users@lists.boost.org> wrote:

...
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda < miguel.ojeda.sandonis@gmail.com> wrote:

...
...
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

It's already difficult enough to teach new programmers not to serialise the result of std/boost hash.

Providing a means to ensure that it's predictable would strengthen the illusion that it's predictable across compilers and architectures. This would be a grave error.

I would argue the opposite. std::hash should work hard to ensure that for any two runs of the same program, the results of a hash will be wildly different.

This would make it easier to spot incorrect uses of it.

What about maps in shared memory? You're suggesting that it's ok that one process built with version X of boost should have no expectation of being able to operate correctly with a process build with version X+1. Insanity.

Bjorn Reese

28 Oct 28 Oct

6:05 p.m.

On 10/23/18 11:25 AM, Miguel Ojeda via Boost-users wrote:

...

Not only that, but stating that the hash should remain constant across C++/Boost releases is basically stating the hash function should be fixed forever. That removes all the freedom for improvements when future hash functions are discovered or implemented, with better properties (which is what happened in the commits I linked).

While I do not disagree with your arguments, we have a special situation because the algorithm for boost::hash_combine was actually documented in older Boost releases, including 1.53 that the OP is upgrading from, so it would have been reasonable to assume that it stayed fixed. It is not documented in newer releases though: https://lists.boost.org/Archives/boost/2014/07/215577.php The best way to ensure that it is unchanged is to copy the old boost::hash_combine into your own code.

2463

Age (days ago)

2469

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Bjorn Reese
degski
james
Miguel Ojeda
Richard Hodges
Shailja Prasad