
I have been doing some profiling using VTune recently to investigate the cause of some underperforming multi-threaded code, and it's possible that I'm not interpreting these results correctly, but if I am it's a little surprising. This isn't the best multi-threaded design in the world, but it's basically a naive implementation of a producer consumer that uses a deque as the production queue, a boost::mutex as the lock, and a boost::condition as a signal to wake up when there's stuff in the queue (queue comment about boost::circular_buffer, which this code really should be using instead). There is LOTS of context switching going on, basically I read a 4KB chunk of data from the filesystem, put it in the queue, the other threads takes the 4KB of data, does something with it, and this repeats for a really long time. Since it doesn't take a whole lot of time to read 4KB of data from the disk, you can tell that there are going to be zillions of context switches and contention going on.
i would be curious, did you have a look at my lock-free data structures [1] ? its main purpose is to provide a boost-style lock-free fifo, supporting multiple producer and consumer threads ... i am not sure, whether it is feasible for your use case, but it should reduce context switches caused by the queue guards ... best, tim [1] http://tim.klingt.org/git?p=boost_lockfree.git;a=summary -- tim@klingt.org http://tim.klingt.org A year ago, six months ago, I thought that I was an artist. I no longer think about it, I am. Henry Miller