Hi, This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code: class shared_spin_lock { atomic_max_uint_t state_; lock_shared() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { reader_wait(); // waiting for writer to finish continue; } auto new_state = snapshot + 1; } while(!CAS(state, new_state)); } unlock_shared() { --state; } lock() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { writer_wait(); // waiting for other writer to finish continue; } auto new_state = snapshot.set_writer_bit(); } while(!CAS(state, new_state)); // we have set the writer bit, now waiting for readers to finish while (state_.readers_count) ; // busy loop, can we do better? writer_started(); } unlock() { auto snapshot = state_.reset_writer_bit(); writer_finished(); ASSERT (snapshot.has_writer_bit_set()); } }; When all the *_wait() and writer_*() functions do nothing we get a shared spinlock. Now if we add helper mutex and make all the *_wait() functions to lock and unlock mutex; writer_start() - lock the mutex; writer finished() - unlock the mutex... we get a simplified shared_mutex that is faster (in theory) than current implementation. Is there an interest in such shared_spinlock_t? Any comments, suggestions? -- Best regards, Antony Polukhin