Re: [boost] [lockfree::fifo] Review

20 Jan 2010

      "Helge Bahmann" <hcb@chaoticmind.net> wrote in message 
news:alpine.DEB.1.10.1001191553030.25332@m65s28.vlinux.de...
...
On Mon, 18 Jan 2010, Chris M. Thomasson wrote:
...
"Helge Bahmann" <hcb@chaoticmind.net> wrote in message 
news:201001181953.33044.hcb@chaoticmind.net...
...
Am Saturday 16 January 2010 03:15:44 schrieb Chris M. Thomasson:
...
FWIW Helge, here is a new proxy garbage collector algorithm of mine 
that I
am currently experimenting with:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a
53f24de178b419f
interesting concept, but AFAICT it shares two undesirable properties 
with my
implementation: Reclamation can be delayed indefinitely if
- one thread is "stuck" within the critical section (e.g. scheduled 
away)
Well, if a thread actually deadlocks or "permanently livelocks" within 
the critical section, then no reclamation can occur.
this means that reclamation (and by extension writing with bounded amount 
of memory) is not wait-free (this is not a problem I think, just pointing 
out)
Yes; very good point Helge. I simply _failed_ to clarify that the wait-free 
properties come from the fact that I am not using loops (e.g., CAS loop) to 
acquire/release a reference to a collector object. This has been a point of 
interest in the past over on `comp.programming.threads'. Previous collectors 
used CAS-loop to either acquire and/or release a reference. This does not 
work out to well for a reader-writer pattern because only one reader thread 
can gain access while the others fail their CAS and have to try again. This 
lock-free property can result in fairly poor performance for readers. 
Therefore, we have been coming up with wait-free techniques to allow readers 
to gain access without looping around in software.
...
...
...
- threads interlock such that at least one thread is always within the
critical section
...
Right?
No. My proxy algorithm is N = 2, while yours is N = 1 where N is the 
number of collector objects. This means that you only have a _single_ 
reference count. During periods of load, this single reference count 
might never reach zero and the garbage will keep piling up.
[snip explanation]
Okay thanks for your explanation, I did not understand your code correctly 
the first time, but now things are much more clear.
That's great!  :^)
...
The concept is very nice indeed, and I agree that N=1 and N=2 makes quite 
a difference. I *think* that your approach is much more valuable as a 
building block for "object-local RCU" for data structures with long-lived 
read access than ABA prevention for producer/consumer data structures 
(where the read-side critical section is very short).
I think I agree with that statement.
...
Thanks for sharing the code,
You are welcome! If you can think of any optimizations I would LOVE to hear 
all about them. One thing I can think of would be to remove the need for 
threads to actually destroy objects. This can be moved to a separate thread 
or the objects can simply be stuck in a cache to be reused. For instance, if 
the `proxy::release(...)' procedure was turned into a function which 
returned a pointer to a stack of nodes that are in a quiescent state, then 
the thread could decide what to do with them. I think that this would make 
the algorithm more flexible.

Please note that this is only an initial implementation and it's more of a 
proof-of-concept than anything. There are probably many improvements that 
can be made...

;^)
...
I can already think of an application for this in my own code :)
Cool. IMVHO, the proxy collector is fairly generic in that you can use it to 
solve memory reclamation in many different types of non-blocking algorithms 
and *usage patterns. I think that if Boost does include a non-blocking 
library, then something along the lines of the collector algorithm should 
probably be included.

[*] - I am experimenting with the collector using the following usage 
pattern. Imagine a database server that get very many sustained query on a 
regular basis. Now, reader threads receive search requests and then execute 
the query on a dynamic shared data-structure. Here is high-level pseudo-code 
of how I am constructing the logic of the reader threads:
__________________________________________________________________
#define SYNC_INTERVAL 256
#define DEFER_THRESHOLD 128

typedef proxy<DEFER_THRESHOLD> proxy_type;

static proxy_type g_proxy;

void reader_thread()
{
    proxy_type::collector* c = &g_proxy.acquire();

    for (unsigned i = 1 ;; ++i)
    {
        query_type* q = try_to_get_a_query_request();

        if (! q)
        {
            g_proxy.release(*c);

            q = wait_for_a_query_request();

            c = &g_proxy.acquire();
        }

        execute_query(q);

        if (! (i % SYNC_INTERVAL))
        {
            g_proxy.release(*c);

            c = &g_proxy.acquire();
        }
    }

    g_proxy.release(*c);
}
__________________________________________________________________

Instead of acquiring/releasing a collector object for every single execution 
of a query this technique instead attempts to amortize the number of times a 
reader thread needs to acquire/release a collector.

Now, I can think of a possible optimization/enhancement to the proxy 
algorithm itself. Take note that an odd reference count denotes that a 
collection cycle is in progress. Therefore, if a reader thread can simply 
test if the reference count of it's acquired collector object is even, then 
it simple does not actually "need" to release a reference to it. If it's 
odd, then a release is necessary in order to move along the current 
collection process. This would allow the reader threads acquire/release 
pattern to be amortized across the actual garbage deferment threshold. I 
think I am going to alter the algorithm in order to support the following 
usage pattern:
__________________________________________________________________
#define DEFER_THRESHOLD 256

typedef proxy<DEFER_THRESHOLD> proxy_type;

static proxy_type g_proxy;

void reader_thread()
{
    proxy_type::collector* c = &g_proxy.acquire();

    for (;;)
    {
        query_type* q = try_to_get_a_query_request();

        if (! q)
        {
            g_proxy.release(*c);

            q = wait_for_a_query_request();

            c = &g_proxy.acquire();
        }

        execute_query(q);

        if (c->is_collecting())
        {
            g_proxy.release(*c);

            c = &g_proxy.acquire();
        }
    }

    g_proxy.release(*c);
}
__________________________________________________________________

This would allow reader threads to curn along during periods of load with 
fairly low overhead. I say this because the only time the readers would have 
to release their collector object and re-acquire it is when they must 
explicitly wait for query requests, or when the garbage threshold was 
reached and a collection cycle was initiated by a writer thread. There are 
some fairly promising enhancements I think I can make to the collector, and 
that is one of them.

Humm...