[thread] Customizing barrier for improved performance

6 Jun 2010

      Hi,

I notice that the thread barrier class is fairly large (128 bytes on  
Darwin with Intel 11.1).

     private:
         mutex m_mutex;
         condition_variable m_cond;
         unsigned int m_threshold;
         unsigned int m_count;
         unsigned int m_generation;

and sort of slow for my application (parallel iterative solvers of  
sparse linear systems).  Many iterative algorithms have both serial  
and parallel sections during a single iteration and, for larger  
algorithms, this can result in numerous (order 10 or so) rendezvous  
points during each iteration.  During cursory testing I've found that  
a barrier implemented with atomics is a bit faster than a mutex based  
barrier (though I recognize that an atomic spin-based implementation  
can potentially hang if running on a single Intel core with hyper- 
threading enabled).

I've attached a simple atomic based implementation built on Intel tbb  
atomic though it's easily convertible to boost.atomic when the time  
comes.  This implementation just ping-pongs a counter alternating  
between incrementing and decrementing the counter each time it's called.

Does anyone know if there's plans to extend barrier so that a user  
could select a different implementation (like an atomic based one)?   
For some applications this could be a very useful extension.

Thanks.

-- Noel

Belcourt, Kenneth

tags

participants (1)