[hash] hashing of smart pointer types - prevent conversion to bool?

Hi Boosters, I'm sure I'm not the first person to be caught out by this. If you try to use Boost.Functional/Hash on a smart pointer type such as boost::shared_ptr, the results may be, well, underwhelming: typedef boost::shared_ptr<int> IPtr; IPtr i1 = IPtr(new int(1234)); IPtr i2 = IPtr(new int(4321)); std::cout << std::hex << "hash(i1): " << boost::hash<IPtr>()(i1) << "\n" << "hash(i2): " << boost::hash<IPtr>()(i2) << "\n"; which produces: hash(i1): 1 hash(i2): 1 This is because the shared_ptr is being silently converted to a boolean type before being hashed. Now maybe this is all documented and so forth but I would argue that it's not really expected behaviour. Or to put it another way, it's very easy to go wrong and use smart pointers in a hashed container such as a boost::unordered_set. The performance of such a set obviously degrades considerably, when compared to a more intelligent hash function. Here's a simple snippet to show the problem: typedef boost::shared_ptr<int> IPtr; template <typename T> struct PtrHash : public std::unary_function<boost::shared_ptr<T>, std::size_t> { std::size_t operator() (boost::shared_ptr<T> const& ptr) const { return boost::hash<T*>()(ptr.get()); } }; template <typename Set> double TimeInserts(int i) { Set s; boost::timer t; while (i--) s.insert(IPtr(new int(i))); return t.elapsed(); } int main() { typedef boost::unordered_set<IPtr> IPtrSetDefaultHash; typedef boost::unordered_set<IPtr, PtrHash<int> > IPtrSetPtrHash; const int iterations = 10000; std::cout << std::dec << "IPtrSetDefaultHash: " << TimeInserts<IPtrSetDefaultHash>(iterations) << "s\n" << "IPtrSetPtrHash: " << TimeInserts<IPtrSetPtrHash>(iterations) << "s\n"; return 0; } Which outputs: IPtrSetDefaultHash: 2.14526s IPtrSetPtrHash: 0.006855s ... a noticable performance difference. OK so obviously it is reasonable to want to question the wisdom of using a smart pointer type in a hashed container like this. But I want to ignore that question for now, and instead ask whether it is possible to *prevent* the conversion of smart pointers to bool for the purposes of calculating hash values. The context for this question is a fairly large codebase with extensive use of hashed containers. Not everyone on the development team is aware of this particular limitation of smart pointers. So not only do I want to detect the use of "dangerous" hashing functions but also prevent them being used in the future. Instead I really want to either fix or prevent the conversion to bool for smart pointers when used in a hashing function (either through the boost:: or tr1:: namespace). One idea is to patch the boost headers to add a hash function for each of the smart pointer types (as per the example above) but there must be a better answer than this? Does this problem affect anyone else? How do people deal with this?

One idea is to patch the boost headers to add a hash function for each of the smart pointer types (as per the example above) but there must be a better answer than this?
Technically, you can define hash functions in your own headers, in boost namespace (that's what I do). I'm not sure about legal aspects of this though.

On 6 July 2010 13:52, Igor R <boost.lists@gmail.com> wrote:
One idea is to patch the boost headers to add a hash function for each of the smart pointer types (as per the example above) but there must be a better answer than this?
Technically, you can define hash functions in your own headers, in boost namespace (that's what I do). I'm not sure about legal aspects of this though.
It's open source, so really it's up to you. Although I'd suggest that if you have adapted a boost class, it'd be a good idea to submit a patch to the appropriate maintainer. boost::shared_ptr should be pretty simple, something like (untested): namespace boost { template <typename T> std::size_t hash_value(boost::shared_ptr<T> const& x) { boost::hash<T*> hash_ptr; return hash_ptr(x.get()); } } Daniel

On 6 July 2010 13:15, Alastair Rankine <arsptr@internode.on.net> wrote:
I'm sure I'm not the first person to be caught out by this.
You're the first person to report it.
The context for this question is a fairly large codebase with extensive use of hashed containers. Not everyone on the development team is aware of this particular limitation of smart pointers. So not only do I want to detect the use of "dangerous" hashing functions but also prevent them being used in the future. Instead I really want to either fix or prevent the conversion to bool for smart pointers when used in a hashing function (either through the boost:: or tr1:: namespace).
It won't happen for a 'std' or 'std::tr1' implementation, as this issue is caused by the boost extensions to the standard. It also won't happen in boost if you disable the extensions by defining 'BOOST_HASH_NO_EXTENSIONS'. I do actually view this as a bug. Of the top of my header, I can think of a few possible fixes: 1. Remove 'hash_value(bool)', changing boost::hash<bool> to calculate the hash value itself - that would fix this issue, but there are still potential problems with other classes that have an implicit conversion to certain types. I'd prefer a general solution. 2. Prevent implicit casts by adding a templated 'hash_value' containing a static assertion. I like this, it's simple and not very disruptive. There might be code out there that requires on 'hash_value' picking up implicit castsm but if there is, it's probably fairly dubious. 3. Move 'hash_value' for built in types to a different namespace, or rename them. Would have to add more specialisation to boost::hash, so that it deals with all the appropriate types. Also have to add a macro to allow anyone using the existing functions to continue to use them if they wish. This would have been a good idea if it was done from the start, but I don't like the idea of removing public functions which have been available for a while. Maybe some other possibilities. I'll think about it some more. I'm not sure if I'll be able to get a fix ready for the next release. Daniel

On 2010-07-07 09:01:32 +1000, Daniel James said:
On 6 July 2010 13:15, Alastair Rankine <arsptr@internode.on.net> wrote:
I'm sure I'm not the first person to be caught out by this.
You're the first person to report it.
Maybe everyone else who is hashing smart pointers isn't willing to own up to it? :)
It won't happen for a 'std' or 'std::tr1' implementation, as this issue is caused by the boost extensions to the standard. It also won't happen in boost if you disable the extensions by defining 'BOOST_HASH_NO_EXTENSIONS'.
Ah good, that will probably work nicely for me. Thanks.
I do actually view this as a bug. Of the top of my header, I can think of a few possible fixes:
FWIW I like option 2 but I'm happy with any of these fixes. Look forward to seeing this in an upcoming boost, if not the next one. Thanks for your response.
participants (3)
-
Alastair Rankine
-
Daniel James
-
Igor R