
Hal Vaughan: ...
It's supposed to fill a buffer with data on one thread and remove data on the other thread. I would expect it to add data and remove at almost the same time but when I run it, it adds 10 items, then removes them, then adds ten again, and so on. It's never adding and removing in an "interleaved" fashion. I've also tried the same with simple counters where both threads count to ten, but it doesn't seem to make a difference. It seems no matter how I set it up, one thread either completes, or runs until it has to stop for a mutex, then the other thread runs. It seems that there is never a case of both threads running at the same time. I've never had trouble doing this in Perl with fork() so I'm not at all sure what could be going wrong.
There are several possible answers to this one, from worst to best: A1. Unlock the mutex before notify_one and then yield. A2. The threads are typically scheduled to run for a certain timesice; 10ms is one possible value. A context switch from one thread to another takes time, and the fact that the new thread doesn't have its data in the cache is an additional performance hit. Context switches are pure waste of time, so the less context switches your program performs, the faster it will appear to run. In your case, the pushing thread can do many iterations in its timeslice, so it runs until the buffer is full, then blocks. A3. In a "real" multithreaded program, you almost never want threads to run interleaved. You want them to run at the same time on different CPUs in order to achieve maximum throughput. Amdahl's law says that the time it takes for a program on K CPUs to finish is S + P / k, where S is the sequential portion and P is the parallelizable portion. The less S you have, the better. It follows that benchmarks that have P == 0, such as one in which the threads repeatedly fight for the same mutex, are generally useless; they would take much less time if rewritten to run single-threaded. You want your benchmark to measure something significant; for that you need something along the lines of: void writer() { for (int n = 0; n < ITERS; ++n) { // create an item "it", taking time buf.put( it ); } } void reader() { for (int x = 0; x < ITERS; ++x) { Item it = buf.get(); // process the item "it", taking time } } where the time it takes to create an item and to process an item is significant compared to the queue overhead (or a single thread creating and processing items will be faster). I see in your later post that an item is "created" by reading from a serial port. In this case, if you use blocking reads, you won't need to do anything; the "writer" thread will just block until there's something to read from the port, and the "reader" thread will block when there's no data in the queue.