
Hi boosters, I believe this question should already have been raised but I can't find any info on this: Why boost::hash is not specialized for 64-bit types by default? Is there any reasonable reason for that? Thanks a lot for the answer and for Boost itself. -- Alexei Alexandrov

Alexei Alexandrov wrote:
Hi boosters,
I believe this question should already have been raised but I can't find any info on this: Why boost::hash is not specialized for 64-bit types by default? Is there any reasonable reason for that?
Thanks a lot for the answer and for Boost itself.
By 64-bit types I guess you mean 'long long'? Mainly because it wasn't specified in Peter's original paper. But there is a problem with implementing it. The integer hashes are defined as: std::size_t hash_value(val) { return val; } This is fine for most types, but since long long typically has a much larger range of values than std::size_t, it will work very badly for long long. The reasoning behind explicitly defining the function in the specification it so that it should have predictable quality on different platforms and should give equal hash values for equivalent values of different types. I think it's possible to change the definition to one that remains close to the spirit of the original specification and works well for the larger integer types. I'll be looking into doing this for Boost 1.35. Daniel

Hi Daniel James, you wrote:
This is fine for most types, but since long long typically has a much larger range of values than std::size_t, it will work very badly for long long.
Well, the idea of hashing is that developer knows (to some extent) the data values distribution and it's his/her responsibility not to do silly things. For example, boost does provide hash function for vector despite the vector has much more possible combinations that can fit into size_t. long long is just a vector of 2 longs after all... :-)
I think it's possible to change the definition to one that remains close to the spirit of the original specification and works well for the larger integer types. I'll be looking into doing this for Boost 1.35.
Looking forward to it. Thanks. -- Alexei Alexandrov

Daniel James wrote:
The integer hashes are defined as:
std::size_t hash_value(val) { return val; }
This is fine for most types, but since long long typically has a much larger range of values than std::size_t, it will work very badly for long long.
For larger integer types, we probably need something along the lines of std::size_t hash_value( val ) { size_t r = val; while( val >>= size_t_bits ) { hash_combine( r, (size_t)val ); } return r; } It has the property of preserving the hash_value of all _values_ that can fit into a size_t, even though the underlying type is capable of holding bigger integers.

Peter Dimov wrote:
For larger integer types, we probably need something along the lines of
std::size_t hash_value( val ) { size_t r = val;
while( val >>= size_t_bits ) { hash_combine( r, (size_t)val ); }
return r; }
It has the property of preserving the hash_value of all _values_ that can fit into a size_t, even though the underlying type is capable of holding bigger integers.
I don't think it does. It needs to be done in the reverse order, and needs to use a different combine function which hashes 0's to 0. For example, for 10, that would do: r = 10; hash_combine(r, 0); Which would not give 10. Negative numbers would be a problem, so I was thinking of inverting the higher parts (ie. a 64-bit -2, would convert to the 32-bit values FF, FE, then I'd invert the first one, to get 00, FE and combine it to make FE) but that'd be a little slow as it requires a comparison. So I was going to try to come up with something better. Sorry if that isn't very clear. I wanted to spend a bit of time writing this up and I've been thinking about other possibilities as well. Daniel

Daniel James wrote:
Peter Dimov wrote:
For larger integer types, we probably need something along the lines of
std::size_t hash_value( val ) { size_t r = val;
while( val >>= size_t_bits ) { hash_combine( r, (size_t)val ); }
return r; }
It has the property of preserving the hash_value of all _values_ that can fit into a size_t, even though the underlying type is capable of holding bigger integers.
I don't think it does.
Actually, it does for positive number. Sorry I misread it. What you were doing clicked a couple of seconds after I clicked on 'send'. But I'm not sure it works for negative numbers. Daniel

Daniel James wrote:
Daniel James wrote:
Peter Dimov wrote:
For larger integer types, we probably need something along the lines of
std::size_t hash_value( val ) { size_t r = val;
while( val >>= size_t_bits ) { hash_combine( r, (size_t)val ); }
return r; }
It has the property of preserving the hash_value of all _values_ that can fit into a size_t, even though the underlying type is capable of holding bigger integers.
I don't think it does.
Actually, it does for positive number. Sorry I misread it. What you were doing clicked a couple of seconds after I clicked on 'send'. But I'm not sure it works for negative numbers.
You're right, it doesn't. It'd need a check against (size_t)-1 instead of 0 in the while loop. std::size_t hash_value( signed T val ) { unsigned T v2( val ); size_t r = v2; size_t s = val >= 0? 0: (size_t)-1; while( (size_t)(v2 >>= size_t_bits) != s ) { hash_combine( r, (size_t)v2 ); } return r; } or something like that. Now that I think about it, I'm not sure whether under the existing implementation -1 and -1L are guaranteed to hash to the same value, so it might not matter.

Daniel James wrote:
Actually, it does for positive number. Sorry I misread it. What you were doing clicked a couple of seconds after I clicked on 'send'. But I'm not sure it works for negative numbers.
Okay, probably not perfect, but here's a mostly working version: template <class T> size_t integer_hash_value(T val) { const int size_t_bits = std::numeric_limits<size_t>::digits; size_t seed = val; // Two's complement for negative numbers but only after using // the initial value so that: hash_value(-2) != hash_value(1) val = val < 0 ? -1 - val : val; while( val >>= size_t_bits ) { hash_combine(seed, (size_t) val); } return seed; } My idea was something roughly like this: template <class T> size_t hash_value2(T val) { const int size_t_bits = std::numeric_limits<size_t>::digits; // ceiling(std::numeric_limits<T>::digits / size_t_bits) - 1 const int length = (std::numeric_limits<T>::digits - 1) / size_t_bits; size_t seed = 0; T positive = val < 0 ? -1 - val : val; // Hopefully, this loop can be unrolled. for(unsigned int i = length * size_t_bits; i > 0; i -= size_t_bits) { seed ^= (size_t) (positive >> i) + (seed<<6) + (seed>>2); } seed ^= (size_t) val + (seed<<6) + (seed>>2); return seed; } Which looks more complicated, but if the compiler optimizes it well (or I optimize the code), it could be faster (as there are less comparisons). Although, maybe this is a premature optimization, since long long is unlikely to have more than twice the number of bits than std::size_t. I'm not suggesting the specification should spell out an algorithm, but could describe requirements that both those algorithms would meet. This would give library authors some latitude to deal with the strengths of the processor they're targeting and their own imagination. Daniel
participants (3)
-
Alexei Alexandrov
-
Daniel James
-
Peter Dimov