[boost] Re: [review] hash functions

12 Mar 2005

      David Abrahams wrote:
...
Alberto Barbati <abarbati@iaanus.com> writes:
...
Hi,
I have a doubt about the proposed implementation of hash_value() for 
pointers, which is:
template <class T> inline std::size_t hash_value(T* v)
   {
       return static_cast<std::size_t>(v - (T*)0);
   }
this code calls for undefined behaviour according to §5.7/6, each time v 
is a non-null pointer.
Furthermore, you can't subtract addresses that aren't in the same
array.  0 isn't in the same array as anything.
So we agree that this code has to be fixed in some way. I don't think 
Boost could accept the code as it is.
...
...
Unfortunately, there's one more thing... even if p == q, 
reinterpret_cast<uintmax_t>(p) is not guaranteed to be equal to 
reinterpret_cast<uintmax_t>(q) on implementations that have multiple 
internal representations for the same address. I don't think much can be 
done (in a portable way) for such platforms.
Once you're into reinterpret_cast you've left the land of strict
portability anyway.
I agree with that literally, but I don't fully understand what do you 
mean with such remark, so I will re-formulate my thought hoping to make 
it clearer. Probably we are saying the same thing.

Apart from the very special case described above, using 
reinterpret_cast<> as I describe will provide a viable hash function. 
The fact that its values are implementation-defined is irrelevant, I 
think (but... see below).

However, *in that special case above*, the basic requirement of hash 
functions "equal arguments yield the same result" is not satisfied so 
even reinterpret_cast<> does not provide a viable hash function. As the 
standard does not provide any other portable mean to map generic 
pointers to integers, there's *no way* we can obtain a hash for pointers 
in this case.

Ah! And there's one more case in which we are helpless, although I'm 
curious to see an implementation that exploits such latitude: the 
wording used in the standard does not guarantee that the mapping from 
pointers to integers always produce the same value over time. In other 
words:

   char* p = new char;
   size_t i1 = reinterpret_cast<size_t>(p);
   size_t i2 = reinterpret_cast<size_t>(p);
   assert(i1 == i2); // ???
   char* q1 = reinterpret_cast<char*>(i1);
   char* q2 = reinterpret_cast<char*>(i2);
   assert(p == q1 && q1 == q2);

The standard guarantees that the second assert passes, but says nothing 
about the first assert.

Coming back to the hash library, I think we have only three choices:

1) use reinterpret_cast<> as I proposed, but documenting that it will 
produce a viable hash function only if the underlying implementation 
satisfy certain requirements (each specific implementation /should/ 
document the behaviour of reinterpret_cast anyway, so the user may able 
to check)

2) provide the hash function (with reintepret_cast<>) only for those 
platforms that we are able to check in advance that will satisfy those 
requirements and make any use of such function produce a static assert 
for all other platforms

3) do not provide a hash function for pointers at all

I would vote for 1, but 2 and 3 also make sense. Is there a fourth choice?

Alberto