[pool][Ticket #2359]Performance impact of bug fix

25 Jul 2009

      The bug fix implemented for this ticket has a significant (approx 25~35%)
performance impact when using fast_pool_allocator for the shared_ptr<>
allocator, as in,

     static fast_pool_allocator<T> pool;
     shared_ptr<T> a(T,Tdestroyer,pool);

Background:
Although http://svn.boost.org/trac/boost/ticket/2359 has an extensive
analysis and discussion of the problem, the key issue was that static
non-local fast_pool_allocators did *not* force the prior construction of
the underlying singleton_pool instance (class template static data members
have unordered initialization).

The fix implemented was to call

      singleton_pool<...>::is_from(0);

in the ctors of fast_pool_allocate to enforce the proper construction
order (for global ctors). Of course, this fix effects all scope & lifetime
fast_pool_allocators.

Problem:
However, is_from() performs other, non-trivial work as well. From
singleton_pool.hpp,

template <...>
struct singleton_pool {
  .
  .
    static bool is_from(void * const ptr)
    {
      pool_type & p = singleton::instance();
      details::pool::guard<Mutex> g(p);
      return p.p.is_from(ptr);
    }

Although the impact at startup is neglible, the copy ctor of
fast_pool_allocator is called during the construction -and- the
destruction of *every* shared_ptr<T> a(P,D,A).  (Because the shared_ptr
needs to rebind the allocator from <T> to sp_counted_impl_pda<P,D,A> ).

The net effect is that the pool is locked and accessed *twice* every time
a shared_ptr is constructed or destructed. Because of the locking
overhead, even in a single-threaded environment, shared_ptr<T>
a(P,D,fast_pool_allocator<T> > does not out-perform shared_ptr<T> a(P,D)!
This situation is especially painful in a contentious MP environment.

One solution (although by-no-means exhaustive) would be to only perform
the bare minimum necessary to force prior construction of the
singleton_pool instance.  For example, in boost/pool/singleton_pool.hpp,

template <...>
struct singleton_pool {
  .
  .
    static void force_construction()
    {
       singleton::instance();
    }

and in boost/pool/pool_alloc.hpp

    fast_pool_allocator()
    {
      singleton_pool<fast_pool_allocator_tag, sizeof(T),
                 UserAllocator, Mutex, NextSize>::force_construction();
    }

    template <typename U>
    fast_pool_allocator(
        const fast_pool_allocator<U, UserAllocator, Mutex, NextSize> &)
    {
      singleton_pool<fast_pool_allocator_tag, sizeof(T),
                 UserAllocator, Mutex, NextSize>::force_construction();
    }

Regards,
 Peter Hurley

PS - Also, casual profiling (gcc,x86,windows) seems to indicate that using
boost::detail::spinlock as the default lock from
<boost/smart_ptr/detail/spinlock.hpp> would yield add'l performance
benefits over details::pool::default_mutex. Faster yet would be a native
locked_compare_exchange spinlock similar to the
atomic_conditional_increment() implemented in the
<sp_counted_base_***.hpp> headers...

Peter Hurley

tags

participants (1)