[thread] Customizing barrier for improved performance

Hi, I notice that the thread barrier class is fairly large (128 bytes on Darwin with Intel 11.1). private: mutex m_mutex; condition_variable m_cond; unsigned int m_threshold; unsigned int m_count; unsigned int m_generation; and sort of slow for my application (parallel iterative solvers of sparse linear systems). Many iterative algorithms have both serial and parallel sections during a single iteration and, for larger algorithms, this can result in numerous (order 10 or so) rendezvous points during each iteration. During cursory testing I've found that a barrier implemented with atomics is a bit faster than a mutex based barrier (though I recognize that an atomic spin-based implementation can potentially hang if running on a single Intel core with hyper- threading enabled). I've attached a simple atomic based implementation built on Intel tbb atomic though it's easily convertible to boost.atomic when the time comes. This implementation just ping-pongs a counter alternating between incrementing and decrementing the counter each time it's called. Does anyone know if there's plans to extend barrier so that a user could select a different implementation (like an atomic based one)? For some applications this could be a very useful extension. Thanks. -- Noel
participants (1)
-
Belcourt, Kenneth