[Boost-users] [hash] hashing of smart pointer types - prevent conversion to bool?

6 Jul 2010

      Hi Boosters,

I'm sure I'm not the first person to be caught out by this. If you try 
to use Boost.Functional/Hash on a smart pointer type such as 
boost::shared_ptr, the results may be, well, underwhelming:

    typedef boost::shared_ptr<int> IPtr;
    IPtr i1 = IPtr(new int(1234));
    IPtr i2 = IPtr(new int(4321));

    std::cout << std::hex
              << "hash(i1): " << boost::hash<IPtr>()(i1) << "\n"
              << "hash(i2): " << boost::hash<IPtr>()(i2) << "\n";

which produces:

    hash(i1): 1
    hash(i2): 1

This is because the shared_ptr is being silently converted to a boolean 
type before being hashed. Now maybe this is all documented and so forth 
but I would argue that it's not really expected behaviour. Or to put it 
another way, it's very easy to go wrong and use smart pointers in a 
hashed container such as a boost::unordered_set. The performance of 
such a set obviously degrades considerably, when compared to a more 
intelligent hash function. Here's a simple snippet to show the problem:

    typedef boost::shared_ptr<int> IPtr;

    template <typename T>
    struct PtrHash
    : public std::unary_function
    {
        std::size_t operator() (boost::shared_ptr<T> const& ptr) const
        {
            return boost::hash()(ptr.get());
        }
    };

    template <typename Set>
    double TimeInserts(int i)
    {
        Set s;
        boost::timer t;
        while (i--)
            s.insert(IPtr(new int(i)));
        return t.elapsed();
    }

    int main()
    {
        typedef boost::unordered_set<IPtr> IPtrSetDefaultHash;
        typedef boost::unordered_set IPtrSetPtrHash;
        const int iterations = 10000;
        std::cout << std::dec
                  << "IPtrSetDefaultHash: " << 
TimeInserts<IPtrSetDefaultHash>(iterations) << "s\n"
                  << "IPtrSetPtrHash: " << 
TimeInserts<IPtrSetPtrHash>(iterations) << "s\n";
        return 0;
    }

Which outputs:

    IPtrSetDefaultHash: 2.14526s
    IPtrSetPtrHash: 0.006855s

... a noticable performance difference.

OK so obviously it is reasonable to want to question the wisdom of 
using a smart pointer type in a hashed container like this. But I want 
to ignore that question for now, and instead ask whether it is possible 
to *prevent* the conversion of smart pointers to bool for the purposes 
of calculating hash values.

The context for this question is a fairly large codebase with extensive 
use of hashed containers. Not everyone on the development team is aware 
of this particular limitation of smart pointers. So not only do I want 
to detect the use of "dangerous" hashing functions but also prevent 
them being used in the future. Instead I really want to either fix or 
prevent the conversion to bool for smart pointers when used in a 
hashing function (either through the boost:: or tr1:: namespace).

One idea is to patch the boost headers to add a hash function for each 
of the smart pointer types (as per the example above) but there must be 
a better answer than this?

Does this problem affect anyone else? How do people deal with this?

[Boost-users] [hash] hashing of smart pointer types - prevent conversion to bool?

Alastair Rankine