
Howard Hinnant:
If there are two threads sharing a processor, and one thread is accessing M1 amount of memory and the other thread is accessing M2 amount of memory, and M1 + M2 is less than the L1 cache size, isn't that faster than if both M1 and M2 are less than the L1 cache size but M1 + M2 is larger than the L1 cache size? I.e. can larger memory use (or more scattered memory use) on one thread make cache misses more likely on another thread?
This is possible in principle (*), but I'm not sure that it can happen in our particular case. At condition::wait thread 1 enters the kernel and is blocked, and thread 2 is awakened. Only a few additional cache lines (one for the "checked release" code, more when using a hash map) are touched by the check in wait; it should be impossible to measure a difference. The checks only occur once per timeslice. Even in the case of a spurious immediate wakeup, when wait will be retried, there are no additional cache line accesses since the memory touched by the check is already in cache. (*) M1+M2 < L1.size seems a bit improbable though; this means that two timeslices worth of (possibly unrelated) code can stay entirely within the L1 cache. L2.size seems a more likely threshold.