
shared_mutex is not designed for this scenario, since you have high contention. shared_mutex is designed for infrequent updates.
Some more observations. The same writer starvation occurs with 2 reader threads, so it's not caused by the overcommit. Two reader threads on two cores is common. The update frequency doesn't matter; a lower update frequency would just scale the time it takes to perform 1M updates, it will not change the average writer wait time. Some wait times (2R+1W): atomics: 7.673 microseconds lightweight_mutex (CRITICAL_SECTION): 3.069 us shared_mutex: 760 us rw_mutex (my implementation): 665 us (same problem) pthread_rwlock_t, pthreads-win32: 7.108 us rw_mutex (Hinnant/Terekhov): 85.532 us This last line uses my reimplementation of Howard Hinnant's read/write mutex based on his description; Howard credits Alexander Terekhov with the original algorithm. It does stall the writer a bit in exchange for optimal reader throughput, but doesn't suffer from outright starvation. I've attached my (not production quality) implementation of this rwlock.