
"Helge Bahmann" <hcb@chaoticmind.net> wrote in message news:201001181953.33044.hcb@chaoticmind.net...
Am Saturday 16 January 2010 03:15:44 schrieb Chris M. Thomasson:
FWIW Helge, here is a new proxy garbage collector algorithm of mine that I am currently experimenting with:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a 53f24de178b419f
interesting concept, but AFAICT it shares two undesirable properties with my implementation: Reclamation can be delayed indefinitely if
- one thread is "stuck" within the critical section (e.g. scheduled away)
Well, if a thread actually deadlocks or "permanently livelocks" within the critical section, then no reclamation can occur.
- threads interlock such that at least one thread is always within the critical section
Right?
No. My proxy algorithm is N = 2, while yours is N = 1 where N is the number of collector objects. This means that you only have a _single_ reference count. During periods of load, this single reference count might never reach zero and the garbage will keep piling up. In my case, I can atomically switch collector objects and reset the reference count thus guaranteeing that the old one will drop to zero exactly when the last thread accessing it releases it's reference. This is because all new threads will take reference's to the new collector. Here is key area in my algorithm, take note of the following procedure in the code: http://cpt.pastebin.com/f537f2c7c __________________________________________________________________ void prv_quiesce_begin(collector& c) { // Try to begin the quiescence process. if (! m_quiesce.exchange(true, std::memory_order_acquire)) { // advance the current collector and grab the old one. 128: size_t i = ((&c - m_collectors) + 1) & 1; 130: unsigned int old = m_current.exchange(i, std::memory_order_acq_rel); // decode reference count. unsigned int refs = old & 0xFFFFFFF0U; // verify previous collector index. RL_ASSERT((old & 0xFU) == (&c - m_collectors)); // increment and generate an odd reference count. c.m_count.fetch_add(refs + 0x10U, std::memory_order_release); } } __________________________________________________________________ Line `128' simply determines the index of the next collector object. Line `130' atomically resets the main reference count to zero _and_ swaps in the next collector index. Now, all new threads will only be able to gain references to the collector that was just swapped in which means that it will be impossible for them to gain access to the old collector. Therefore, the old collector will drop to zero. I guess this is similar to using split counters to implement RCU. However, AFAICT my algorithm is unique and completely different than Paul McKenney's split counter RCU solution.
It's not that I consider this problematic (there is no reason I would ever want to share a data structure between threads if I know that the above can pose a problem).
It can be problematic for various usage patterns. For instance, think if I wanted to use an N = 1 collector to solve the reader/writer problem (e.g., readers iterating over a data-structure while writers are concurrently pushing/popping items). Now, I fully expect reads to be very frequent in such scenarios. This would not work well for N = 1 collector because if reader threads were constantly accessing the data-structure then the reference count would probably never reach zero. However, in an N = 2 collector I can guarantee that new reader threads would not effect the reference count of the swapped out collector in line `130'. This is a case in which N = 2 is key difference between our implementations.
The algorithm is wait-free in practice on systems with native fetch-and-add and swap instructions like x86-32/64. If the architecture forces you to implement fetch-and-add with a damn LL/SC or CAS loop, well, then it's not really wait-free now is it? No, I am afraid it's lock-free, or whatever semantics the LL/SC is which could even be obstruction-free. Sad...
Depends how it is implemented -- LL/SC is wait-free iff other reads to the location do not invalidate the reservation (PowerPC is in that category).
Here are some fairly interesting posts from Andy Glew (former Computer Architect at Intel) wrt LL/SC: http://groups.google.com/group/comp.arch/msg/18f0d50aa3c9a148 http://groups.google.com/group/comp.arch/msg/5d1ff313519005f8
I strongly suspect that CAS on x86 is actually just a micro-coded LL/SC anyway...
I think they take locks to get the job done. Andy Glew would be the guy to ask. So, I should probably post a query over in `comp.arch' where all the hardware gurus hang out. ;^)