Hello there, I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input. boost::hash<string> namehash; size_t hashCode = namehash( name); Where name is string, an e.g “abcdefg” With boost 1.53.0 the hashCode is 168904 And with boost 1.68.0 the hashCode is 69530. I am giving just sample random number here, those two numbers are never equal, with everything remain the same, only change boost version on EL6 Linux machine with GCC 4.9 compiler. This is blocking me to upgrade to newer version of boost. I appreciate if anyone has some info on this. Thanks
On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.
Hm... why would you expect the hash to be always the same between releases, compilers, etc.? I cannot find it with a quick look at Boost.Hash's docs anything regarding a guarantee of that. If it is like std::hash, then it is only guaranteed to remain equal for the duration of the program. In other words, you cannot rely on saving it nor comparing them to other hashes from other vendors, platforms, architectures, compiler releases, etc. Also, taking a quick look at the repository, there were several changes between 1.53 and 1.68, e.g.: https://github.com/boostorg/container_hash/commit/bb2a91bf47354bfce7378394bc... https://github.com/boostorg/container_hash/commit/309d17f38722b7bd15b804e55d... Cheers, Miguel
On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users < boost-users@lists.boost.org> wrote:
On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
wrote: I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like
hashing code generation has changed, since the following line gives two different hashcode for same string input.
Hm... why would you expect the hash to be always the same between releases, compilers, etc.?
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this. I cannot find it with a quick look at
Boost.Hash's docs anything regarding a guarantee of that. If it is like std::hash, then it is only guaranteed to remain equal for the duration of the program.
Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's]. I'm not sure that is a great one and by the time we might [would like to] have constexpr std::ordered_map maybe not even tenable.
In other words, you cannot rely on saving it nor comparing them to other hashes from other vendors, platforms, architectures, compiler releases, etc.
In my view this is an omission, the option to have exactly that should [have been] available. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*
On Tue, Oct 23, 2018 at 10:19 AM degski
On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users
wrote: On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
wrote: I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.
Hm... why would you expect the hash to be always the same between releases, compilers, etc.?
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.
I cannot find it with a quick look at Boost.Hash's docs anything regarding a guarantee of that. If it is like std::hash, then it is only guaranteed to remain equal for the duration of the program.
Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's].
Not sure what you mean. That is what I said.
In other words, you cannot rely on saving it nor comparing them to other hashes from other vendors, platforms, architectures, compiler releases, etc.
In my view this is an omission, the option to have exactly that should [have been] available.
Not really. You could argue, for instance, that precisely because
std::hash (and Boost.Hash) is meant to be used in maps/hash
tables/..., you should not be able to guess the values of the hash in
advance, in order to prevent collision attacks. In other words, the
implementation has even the freedom to provide a different hash
function every run of your program.
Not only that, but stating that the hash should remain constant across
C++/Boost releases is basically stating the hash function should be
fixed forever. That removes all the freedom for improvements when
future hash functions are discovered or implemented, with better
properties (which is what happened in the commits I linked).
In summary: the hashes provided by Boost or the standard are not
intended to be fixed functions; i.e. you shouldn't rely on the actual
values returned, only on the properties of the function. Namely, this
one: "For two different values t1 and t2, the probability that h(t1)
and h(t2) compare equal should be very small, approaching 1.0 /
numeric_limits
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.
For debugging purposes, a fixed function seems quite useful to me. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*
On Tue, Oct 23, 2018 at 12:36 PM degski
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda
wrote: Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.
For debugging purposes, a fixed function seems quite useful to me.
Indeed, that is a good point! An std implementation (and Boost.Hash too) could provide the means to fix the function for debugging (e.g. through a #define). Cheers, Miguel
On Oct 23, 2018, at 10:11 AM, Miguel Ojeda via Boost-users
wrote: On Tue, Oct 23, 2018 at 12:36 PM degski
wrote: On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda
wrote: Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.
For debugging purposes, a fixed function seems quite useful to me.
Indeed, that is a good point! An std implementation (and Boost.Hash too) could provide the means to fix the function for debugging (e.g. through a #define).
Ok, tried few approach to get the same hashCode using boost 1.68.0 which is coming from using boost 1.53.0 ( same machine and compiler) looking into your suggested change link. Thank you, Miguel !
Cheers, Miguel _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On Wed, Oct 24, 2018 at 12:07 PM Shailja Prasad
Ok, tried few approach to get the same hashCode using boost 1.68.0 which is coming from using boost 1.53.0 ( same machine and compiler) looking into your suggested change link. Thank you, Miguel !
You're welcome Shailja! I am glad it was useful :-) Cheers, Miguel
On Tue, 23 Oct 2018 at 12:36, degski via Boost-users < boost-users@lists.boost.org> wrote:
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda < miguel.ojeda.sandonis@gmail.com> wrote:
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.
For debugging purposes, a fixed function seems quite useful to me.
It's already difficult enough to teach new programmers not to serialise the result of std/boost hash. Providing a means to ensure that it's predictable would strengthen the illusion that it's predictable across compilers and architectures. This would be a grave error. I would argue the opposite. std::hash should work hard to ensure that for any two runs of the same program, the results of a hash will be wildly different. This would make it easier to spot incorrect uses of it. R
degski -- *“If something cannot go on forever, it will stop" - Herbert Stein* _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On Wed, Oct 24, 2018 at 12:14 PM Richard Hodges via Boost-users < boost-users@lists.boost.org> wrote:
On Tue, 23 Oct 2018 at 12:36, degski via Boost-users < boost-users@lists.boost.org> wrote:
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda < miguel.ojeda.sandonis@gmail.com> wrote:
Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
No, sorry, that is a completely different use case. Crypto hashes are used, among other things, in network communications, persistent storage, etc. They need to be "fixed" functions, and their standards provide the exact definition. That is not the case at all with std::hash or Boost.Hash.
For debugging purposes, a fixed function seems quite useful to me.
It's already difficult enough to teach new programmers not to serialise the result of std/boost hash.
Providing a means to ensure that it's predictable would strengthen the illusion that it's predictable across compilers and architectures. This would be a grave error.
I would argue the opposite. std::hash should work hard to ensure that for any two runs of the same program, the results of a hash will be wildly different.
This would make it easier to spot incorrect uses of it.
What about maps in shared memory? You're suggesting that it's ok that one process built with version X of boost should have no expectation of being able to operate correctly with a process build with version X+1. Insanity.
On 10/23/18 11:25 AM, Miguel Ojeda via Boost-users wrote:
Not only that, but stating that the hash should remain constant across C++/Boost releases is basically stating the hash function should be fixed forever. That removes all the freedom for improvements when future hash functions are discovered or implemented, with better properties (which is what happened in the commits I linked).
While I do not disagree with your arguments, we have a special situation because the algorithm for boost::hash_combine was actually documented in older Boost releases, including 1.53 that the OP is upgrading from, so it would have been reasonable to assume that it stayed fixed. It is not documented in newer releases though: https://lists.boost.org/Archives/boost/2014/07/215577.php The best way to ensure that it is unchanged is to copy the old boost::hash_combine into your own code.
participants (6)
-
Bjorn Reese
-
degski
-
james
-
Miguel Ojeda
-
Richard Hodges
-
Shailja Prasad