Re: [boost] Re: [review] hash functions

In-Reply-To: <d0pe8q$tdi$1@sea.gmane.org> daniel@calamity.org.uk (Daniel James) wrote (abridged):
But it will be easy to make mistakes for more complicated (possibly recursive) types. Especially since hash_combine is not associative or transitive. I could see that leading to mistakes.
I can see that it's possible to call the function with the wrong arguments, but that kind of problem can be addressed with documentation and/or by careful naming. I don't think that's worth distorting the interface and introducing side effects and spurious variables and inefficiency to fix it.
And it would be quite easy to forget to call hash_value (because many types are implicitly convertible to std::size_t)
Since that conversion is all hash_value does, why does it matter? That said, I agree the second argument should not be hash value. If we care about the first argument, perhaps the problem could be identified with an overload or specialisation. template < typename T, typename S> size_t hash_combine( S seed, const T &v ) { BOOST_STATIC_ASSERT( false ); } template <typename T, typename S> size_t hash_combine<>( size_t seed, const T &v ) { const size_t c = 0x12345678; // Or some other value. return seed ^ (hash_value(v) + c + (seed<<6) + (seed>>2)); } (I'm not so hot on templates so the above may be wrong, but I hope it conveys the idea.)
size_t my_hash_function(int x[4]) { using namespace boost;
return hash_combine( hash_combine(hash_value(x[0]), hash_value(x[1])), hash_combine(hash_value(x[2]), hash_value(x[3]))); }
I agree this is wrong. I think: return hash_combine( hash_combine( hash_combine( hash_value( x[0] ), x[1] ), x[2] ), x[3] ); is reasonable and ideally would be supported.
An alternative would be to implement hash_combine as an object. Something like:
struct hash_seed { hash_seed(); template <class T> void combine(T const&); std::size_t value() const; };
Then it could initialise itself to whatever it wants.
Do you mean the seed should be an object? I can see advantages to that, although it seems like overkill. Adding an optional seed argument to hash_range and/or hash_combine sounds like a good idea, though. -- Dave Harris, Nottingham, UK

Dave Harris wrote:
In-Reply-To: <d0pe8q$tdi$1@sea.gmane.org> daniel@calamity.org.uk (Daniel James) wrote (abridged):
But it will be easy to make mistakes for more complicated (possibly recursive) types. Especially since hash_combine is not associative or transitive. I could see that leading to mistakes.
I can see that it's possible to call the function with the wrong arguments, but that kind of problem can be addressed with documentation and/or by careful naming. I don't think that's worth distorting the interface and introducing side effects and spurious variables and inefficiency to fix it.
Strong words. "Distorting" the interface improves its usability by removing a potential source of mistakes. The mere presence of side effects is not, in itself, an argument. No "spurious" variables are introduced; naming the seed variable, in the cases where it's not already named, can improve readability. As for the inefficiency, perhaps you have the numbers to prove it.
size_t my_hash_function(int x[4]) { using namespace boost;
return hash_combine( hash_combine(hash_value(x[0]), hash_value(x[1])), hash_combine(hash_value(x[2]), hash_value(x[3]))); }
I agree this is wrong. I think:
return hash_combine( hash_combine( hash_combine( hash_value( x[0] ), x[1] ), x[2] ), x[3] );
is reasonable and ideally would be supported.
The idea was to make the wrong version difficult to write. It is true that this also makes the "reasonable" version difficult to write. ;-)

Dave Harris wrote:
In-Reply-To: <d0pe8q$tdi$1@sea.gmane.org> daniel@calamity.org.uk (Daniel James) wrote (abridged):
But it will be easy to make mistakes for more complicated (possibly recursive) types. Especially since hash_combine is not associative or transitive. I could see that leading to mistakes.
I can see that it's possible to call the function with the wrong arguments, but that kind of problem can be addressed with documentation and/or by careful naming. I don't think that's worth distorting the interface and introducing side effects and spurious variables and inefficiency to fix it.
I think it's better if less documentation is required. And I don't see it as a distortion. I've found it quite a good way to represent what it does.
And it would be quite easy to forget to call hash_value (because many types are implicitly convertible to std::size_t)
Since that conversion is all hash_value does, why does it matter?
Because that's not always true. There may be other types which have a conversion to std::size_t, but have a different hash value. And for built in types, the definition of the function might change in future versions.
size_t my_hash_function(int x[4]) { using namespace boost;
return hash_combine( hash_combine(hash_value(x[0]), hash_value(x[1])), hash_combine(hash_value(x[2]), hash_value(x[3]))); }
I agree this is wrong. I think:
return hash_combine( hash_combine( hash_combine( hash_value( x[0] ), x[1] ), x[2] ), x[3] );
is reasonable and ideally would be supported.
Well, maybe. The problem is that occasionally it's the right thing to do. I believe that with the existing version, it's easier to get it right. By the way, with the current implementation, you hash_value shouldn't be called directly, as it need to have the right ADL setup. Instead, an instance of boost::hash<T>, hash_range or hash_combine should be called. I'll have to make this clearer in the documentation. Maybe we should consider something similar to what's currently being proposed for Boost.Range (I haven't been following that very closely, so I'll have to go back over the thread). Daniel
participants (3)
-
brangdon@cix.compulink.co.uk
-
Daniel James
-
Peter Dimov