functional/hash: hash_value with context

Hi, A newbie question regarding functional/hash. I would like to use TR1 unordered_map for a 2D point custom data type. For this, I use hash_value from Boost. But I would like to give/pass a context to this function : here, width & height of domain in which are my 2D points. Currently, I declare theses width and height static. My code (in my cpp file) looks like : static int P_width; static int P_height; std::size_t hash_value(point const& p) { return ((size_t)p.x()<<1)*P_width + (size_t)(p.y()<<1); } But this solution is not satisfactory as I can not have several instances of my class including the unordered_map, i.e., several domains of different width & height. Is there a straightforward solution that I have overlooked ? Regards, Boris.

Boris Mansencal wrote:
Hi,
A newbie question regarding functional/hash.
I would like to use TR1 unordered_map for a 2D point custom data type. For this, I use hash_value from Boost. But I would like to give/pass a context to this function : here, width & height of domain in which are my 2D points. Currently, I declare theses width and height static. My code (in my cpp file) looks like :
static int P_width; static int P_height; std::size_t hash_value(point const& p) { return ((size_t)p.x()<<1)*P_width + (size_t)(p.y()<<1); }
But this solution is not satisfactory as I can not have several instances of my class including the unordered_map, i.e., several domains of different width & height.
Is there a straightforward solution that I have overlooked ?
Is the context-free solution std::size_t hash_value(point const& p) { std::size_t seed( 0 ); hash_combine( seed, p.x() ); hash_combine( seed, p.y() ); return seed; } not working for you? http://boost.org/doc/html/hash/combine.html

Peter Dimov wrote:
Boris Mansencal wrote:
I would like to use TR1 unordered_map for a 2D point custom data type. For this, I use hash_value from Boost. But I would like to give/pass a context to this function : here, width & height of domain in which are my 2D points. Currently, I declare theses width and height static. My code (in my cpp file) looks like :
static int P_width; static int P_height; std::size_t hash_value(point const& p) { return ((size_t)p.x()<<1)*P_width + (size_t)(p.y()<<1); }
But this solution is not satisfactory as I can not have several instances of my class including the unordered_map, i.e., several domains of different width & height.
Is there a straightforward solution that I have overlooked ?
Is the context-free solution
std::size_t hash_value(point const& p) { std::size_t seed( 0 );
hash_combine( seed, p.x() ); hash_combine( seed, p.y() );
return seed; }
not working for you?
It is working... but my tests show better performances when I use my hash function (using width & height) instead of this hash_combine solution. Is there no solution to my question ? (I am not a hash function specialist, so if you have a better hash function, for 2D points in width*height domain, context free or not, I would be happy to know...) Boris.

Boris Mansencal wrote:
It is working... but my tests show better performances when I use my hash function (using width & height) instead of this hash_combine solution.
return ((size_t)p.x()<<1)*P_width + (size_t)(p.y()<<1);
What type does p.x() have in your case, and why do you shift it left by one? In principle, this discards one bit of entropy, if the bucket count is even (but not when it's prime). Why do you multiply p.x() by the width? Isn't x() supposed to be within 0..width and y() within 0..height? It seems that you need to either multiply y() by width or x() by height.
Is there no solution to my question ?
You can't have a context-dependent hash_value, but you can pass a 'Hasher' function object to unordered_map that stores your context. Depending on the typical values of your width, it may also be possible to just use a fixed width of, say, 65536. Unfortunately, there are no "context free" rules regarding hash functions, you have to find one that works best in your particular situation. Even if it doesn't make much sense. :-) One problem with this approach is that it can tie you to a particular unordered_map implementation. We've tried to make hash_combine work adequately for as many cases as possible, but this makes it a bit slower to compute.

Peter Dimov wrote:
Boris Mansencal wrote:
return ((size_t)p.x()<<1)*P_width + (size_t)(p.y()<<1);
What type does p.x() have in your case, and why do you shift it left by one? In principle, this discards one bit of entropy, if the bucket count is even (but not when it's prime).
Why do you multiply p.x() by the width? Isn't x() supposed to be within 0..width and y() within 0..height? It seems that you need to either multiply y() by width or x() by height.
indeed, return (size_t)(p.x()*P_height + p.y()) seems to give slightly better results... (but actually I still do not really grasp why...)
Is there no solution to my question ?
You can't have a context-dependent hash_value, but you can pass a 'Hasher' function object to unordered_map that stores your context.
exactly. This is the solution I was looking for.
I can do :
class ptHasher
: public std::unary_function
Depending on the typical values of your width, it may also be possible to just use a fixed width of, say, 65536.
yes.
Unfortunately, there are no "context free" rules regarding hash functions, you have to find one that works best in your particular situation. Even if it doesn't make much sense. :-) One problem with this approach is that it can tie you to a particular unordered_map implementation. We've tried to make hash_combine work adequately for as many cases as possible, but this makes it a bit slower to compute.
thanks a lot, Boris.

Boris Mansencal wrote:
std::tr1::unordered_map
map_point_to_myType(10, ptHasher(height)); Is there any drawback to do things like that ? Is there any portability issue ?
I can't think of any; this seems to be a good way. For p.y() in 0 .. height-1, you hash function is now "perfect", that is, hash( p1 ) != hash( p2 ) when p1 != p2.

Boris Mansencal wrote:
indeed, return (size_t)(p.x()*P_height + p.y()) seems to give slightly better results... (but actually I still do not really grasp why...)
If P_height happens to be a multiple of the number of buckets in the container and the container uses a plain modulus of the hash value then the x value will have no effect on the chosen bucket. If you want your program to be portable to different container implementations this could be a problem. A possible way to work around this is to apply something like Thomas Wang's integer hash function to the result. http://www.cris.com/~Ttwang/tech/inthash.htm Although the function is specific to the number of bits. But I'm surprised that boost::hash is slow. With full optimization, I'd expect it to work okay. Maybe a simpler hash_value might be better: std::size_t hash_value(point const& p) { std::size_t seed = p.x(); seed ^= (std::size_t) p.y() + (seed<<6) + (seed>>2); return seed; } If possible, I'd be interested in looking at your tests (you can email me off list about them). Daniel
participants (3)
-
Boris Mansencal
-
Daniel James
-
Peter Dimov