
On Aug 26, 2007, at 4:49 PM, Peter Dimov wrote:
Howard Hinnant:
I've been using "sizeof(condition)" as shorthand for "reducing L1 cache misses". Your technique of moving the checking to a static map (or unordered_map)<void*, void*> does indeed reduce the sizeof(condition), but does not reduce L1 cache misses.
map<> is the "debug" variation, there's also a "checked release" one. They do reduce the L1 cache misses on the only path where these misses can be measured, the path that does not call wait. Once you call wait, you enter a blocking kernel call and L1 misses cease to be of importance.
My knowledge is not what it should be in this area. If there are two threads sharing a processor, and one thread is accessing M1 amount of memory and the other thread is accessing M2 amount of memory, and M1 + M2 is less than the L1 cache size, isn't that faster than if both M1 and M2 are less than the L1 cache size but M1 + M2 is larger than the L1 cache size? I.e. can larger memory use (or more scattered memory use) on one thread make cache misses more likely on another thread? -Howard