Hi Boosters,
I'm sure I'm not the first person to be caught out by this. If you try
to use Boost.Functional/Hash on a smart pointer type such as
boost::shared_ptr, the results may be, well, underwhelming:
typedef boost::shared_ptr<int> IPtr;
IPtr i1 = IPtr(new int(1234));
IPtr i2 = IPtr(new int(4321));
std::cout << std::hex
<< "hash(i1): " << boost::hash<IPtr>()(i1) << "\n"
<< "hash(i2): " << boost::hash<IPtr>()(i2) << "\n";
which produces:
hash(i1): 1
hash(i2): 1
This is because the shared_ptr is being silently converted to a boolean
type before being hashed. Now maybe this is all documented and so forth
but I would argue that it's not really expected behaviour. Or to put it
another way, it's very easy to go wrong and use smart pointers in a
hashed container such as a boost::unordered_set. The performance of
such a set obviously degrades considerably, when compared to a more
intelligent hash function. Here's a simple snippet to show the problem:
typedef boost::shared_ptr<int> IPtr;
template <typename T>
struct PtrHash
: public std::unary_function
{
std::size_t operator() (boost::shared_ptr<T> const& ptr) const
{
return boost::hash()(ptr.get());
}
};
template <typename Set>
double TimeInserts(int i)
{
Set s;
boost::timer t;
while (i--)
s.insert(IPtr(new int(i)));
return t.elapsed();
}
int main()
{
typedef boost::unordered_set<IPtr> IPtrSetDefaultHash;
typedef boost::unordered_set IPtrSetPtrHash;
const int iterations = 10000;
std::cout << std::dec
<< "IPtrSetDefaultHash: " <<
TimeInserts<IPtrSetDefaultHash>(iterations) << "s\n"
<< "IPtrSetPtrHash: " <<
TimeInserts<IPtrSetPtrHash>(iterations) << "s\n";
return 0;
}
Which outputs:
IPtrSetDefaultHash: 2.14526s
IPtrSetPtrHash: 0.006855s
... a noticable performance difference.
OK so obviously it is reasonable to want to question the wisdom of
using a smart pointer type in a hashed container like this. But I want
to ignore that question for now, and instead ask whether it is possible
to *prevent* the conversion of smart pointers to bool for the purposes
of calculating hash values.
The context for this question is a fairly large codebase with extensive
use of hashed containers. Not everyone on the development team is aware
of this particular limitation of smart pointers. So not only do I want
to detect the use of "dangerous" hashing functions but also prevent
them being used in the future. Instead I really want to either fix or
prevent the conversion to bool for smart pointers when used in a
hashing function (either through the boost:: or tr1:: namespace).
One idea is to patch the boost headers to add a hash function for each
of the smart pointer types (as per the example above) but there must be
a better answer than this?
Does this problem affect anyone else? How do people deal with this?