
I have been doing some profiling using VTune recently to investigate the cause of some underperforming multi-threaded code, and it's possible that I'm not interpreting these results correctly, but if I am it's a little surprising. This isn't the best multi-threaded design in the world, but it's basically a naive implementation of a producer consumer that uses a deque as the production queue, a boost::mutex as the lock, and a boost::condition as a signal to wake up when there's stuff in the queue (queue comment about boost::circular_buffer, which this code really should be using instead). There is LOTS of context switching going on, basically I read a 4KB chunk of data from the filesystem, put it in the queue, the other threads takes the 4KB of data, does something with it, and this repeats for a really long time. Since it doesn't take a whole lot of time to read 4KB of data from the disk, you can tell that there are going to be zillions of context switches and contention going on. So on to the point. Obviously most of the time is spent in boost::detail::basic_timed_mutex::lock, but what's surprising is the breakdown of time inside the lock function: (Keep in mind this is not timed over a terribly long sample, hence the small numbers, but it is over thousands of actual locks, and ultimately this locking alone makes up a noticeable portion of the sample duration) boost::detail::basic_timed_mutex.lock Total Time: 1,032,486 microseconds (1.032 seconds) Total Wait Time: 505,884 microseconds (.505 seconds) It appears that only half the time is actually spent inside the lock, which is pretty crazy. The rest is spent doing computations, which doesn't seem right for a lock. I drilled down further and came up with this: boost::detail.get_system_time_sentinel Total Time: 266,657 microseconds Total Wait Time: 264 microseconds So that accounts for about half of the computational time. The rest is in a Windows system call somewhere that I was not able to drill into. I don't know about the details of date_time library, or much about the details of thread library either for that matter, but why is a time necessary here at all? This is eventually resolving down to a boost::detail::basic_timed_mutex, but there's nothing timed about the operations I've done. I simply did the following: boost::mutex m; boost::mutex::scoped_lock lock(m); //... This should ultimately result in an infinite timeout being specified to the system call, so all of this computation seems completely unnecessary. It seems like a scoped_lock should completely bypass anything having to do with date_time, and perhaps even normal timed locks should come up with a more efficient strategy. Thoughts?