[pool][Ticket #2359]Performance impact of bug fix

The bug fix implemented for this ticket has a significant (approx 25~35%) performance impact when using fast_pool_allocator for the shared_ptr<> allocator, as in, static fast_pool_allocator<T> pool; shared_ptr<T> a(T,Tdestroyer,pool); Background: Although http://svn.boost.org/trac/boost/ticket/2359 has an extensive analysis and discussion of the problem, the key issue was that static non-local fast_pool_allocators did *not* force the prior construction of the underlying singleton_pool instance (class template static data members have unordered initialization). The fix implemented was to call singleton_pool<...>::is_from(0); in the ctors of fast_pool_allocate to enforce the proper construction order (for global ctors). Of course, this fix effects all scope & lifetime fast_pool_allocators. Problem: However, is_from() performs other, non-trivial work as well. From singleton_pool.hpp, template <...> struct singleton_pool { . . static bool is_from(void * const ptr) { pool_type & p = singleton::instance(); details::pool::guard<Mutex> g(p); return p.p.is_from(ptr); } Although the impact at startup is neglible, the copy ctor of fast_pool_allocator is called during the construction -and- the destruction of *every* shared_ptr<T> a(P,D,A). (Because the shared_ptr needs to rebind the allocator from <T> to sp_counted_impl_pda<P,D,A> ). The net effect is that the pool is locked and accessed *twice* every time a shared_ptr is constructed or destructed. Because of the locking overhead, even in a single-threaded environment, shared_ptr<T> a(P,D,fast_pool_allocator<T> > does not out-perform shared_ptr<T> a(P,D)! This situation is especially painful in a contentious MP environment. One solution (although by-no-means exhaustive) would be to only perform the bare minimum necessary to force prior construction of the singleton_pool instance. For example, in boost/pool/singleton_pool.hpp, template <...> struct singleton_pool { . . static void force_construction() { singleton::instance(); } and in boost/pool/pool_alloc.hpp fast_pool_allocator() { singleton_pool<fast_pool_allocator_tag, sizeof(T), UserAllocator, Mutex, NextSize>::force_construction(); } template <typename U> fast_pool_allocator( const fast_pool_allocator<U, UserAllocator, Mutex, NextSize> &) { singleton_pool<fast_pool_allocator_tag, sizeof(T), UserAllocator, Mutex, NextSize>::force_construction(); } Regards, Peter Hurley PS - Also, casual profiling (gcc,x86,windows) seems to indicate that using boost::detail::spinlock as the default lock from <boost/smart_ptr/detail/spinlock.hpp> would yield add'l performance benefits over details::pool::default_mutex. Faster yet would be a native locked_compare_exchange spinlock similar to the atomic_conditional_increment() implemented in the <sp_counted_base_***.hpp> headers...
participants (1)
-
Peter Hurley