Re: [boost] [lockfree] review

23 Aug 2011

      Dave Abrahams wrote:

[... memory model ...]
...
It's not really different than locking.  If you want to write to shared
data, you need some way of making it not-a-race.  It's just that when
the data structure is small enough (like an int) you can make it atomic
instead of putting a lock around it.
No. See: 

http://www.cl.cam.ac.uk/~pes20/cppppc/

Note that the proposed MM is still incomplete by (currently) not 
supporting atomic RMW operations (load-reserve/store-conditional) 
which are essential for locking.

regards,
alexander.

P.S. I don't like C++11 MM atomics, I think that atomic loads and 
stores ought to support the following 'modes':

  Whether load/store is competing (default) or not. Competing load 
  means that there might be concurrent store (to the same object). 
  Competing store means that there might be concurrent load or 
  store. Non-competing load/store can be performed non-atomically.

  Whether competing load/store needs remote write atomicity (default 
  is no remote write atomicity). A remote-write-atomicity-yes load
  triggers undefined behaivior in the case of concurrent remote-
  write-atomicity-no store.

  Whether load/store has specified reordering constraint (default 
  is no constraint specified) in terms of the following reordering 
  modes:

    Whether preceding loads (in program order) can be reordered 
    across it (can by default).

    Whether preceding stores (in program order) can be reordered 
    across it (can by default).

    Whether subsequent loads (in program order) can be reordered 
    across it (can by default). For load, the set of constrained 
    subsequent loads can be limited to only dependant loads (aka 
    'consume' mode).

    Whether subsequent stores (in program order) can be reordered 
    across it (can by default). For load, there is an implicit 
    reordering constraint regarding dependent stores (no need to 
    specify it).

    A fence/barrier operation can be used to specify reordering 
    constraint using basically the same modes.

Re C++11 MM, I'm still missing more fine-grained memory order 
labels such as in pseudo C++ example below.

(I mean mo::noncompeting, mo::ssb/ssb_t (sink store barrier, a 
release not affecting preceding loads), slb/slb_t (a release not 
affecting preceding stores) below, and somesuch for relaxed acquire)

// Introspection (for bool argument below) aside for a moment 
template<typename T, bool copy_ctor_or_dtor_can_mutate_object> 
class mutex_and_condvar_free_single_producer_single_consumer { 

  typedef isolated< aligned_storage< T > > ELEM; 

  size_t           m_size; // > 1 
  ELEM *           m_elem; // array of elements, init'ed by ctor
  atomic< ELEM * > m_head; // initially == m_elem
  atomic< ELEM * > m_tail; // initially == m_elem

  ELEM * advance(ELEM * elem) const { 
    return (++elem < m_elem + m_size) ? elem : m_elem; 
  } 

public: 

  mutex_and_condvar_free_single_producer_single_consumer(); // ctor
 ~mutex_and_condvar_free_single_producer_single_consumer(); // dtor

  void producer(const T & value) { 
    ELEM * tail = m_tail.load(mo::noncompeting); // may be nonatomic
    ELEM * next = advance(tail); 
    while (next == m_head.load(mo::relaxed)) usleep(1000); 
    new(tail) T(value); // placement copy ctor (make queued copy)
    m_tail.store(next, mo::ssb); // cheaper than mo::release
  } 

  T consumer() { 
    ELEM * head = m_head.load(mo::noncompeting); // may be nonatomic
    while (head == m_tail.load(mo::consume)) usleep(1000); 
    T value(*head); // T's copy ctor (make a copy to return)
    head->~T(); // T's dtor (cleanup for queued copy)
    m_head.store(advance(head), type_list< mo::slb_t, mo::rel_t >:: 
      element<copy_ctor_or_dtor_can_mutate_object>::type()); 
    return value; // return copied T
  } 

}; 

Note also that given that example above presumes that no more than 
one thread can read from relevant atomic locations while they are 
written concurrently, there is definitely no need to pay the price 
of remote write atomicity even if it is run on 3+ way 
multiprocessor... IOW, hwsync is unneeded even if all mo::* above 
are changed to SC... but upcoming C++11 MM doesn't allow to express 
no-need-for-remote-write-atomicity for SC atomics.

Re: [boost] [lockfree] review

Alexander Terekhov