Re: [boost] [Review] Lockfree review: two more days left

26 Jul 2011

      hi julien,
...
First, I have re read the doc and started to look at the implementation and
I found something that is apparently a contradiction:
- In the doc you say "Spinlocks do not directly interact with the operating
system either. However it is possible that the owning thread is preempted
by the operating system, which violates the lock-free property"
- In the code however (for fifo<T>), the code seems to be "spinning" all
the time using "for (;;)" loops.
Could you give me some precisions on why this is ok ?
spinning is not the problem, locking is. the spinning in fifo (aka queue) and 
stack basically has the semantics `try an operation, if it fails because another 
thread interferes with it, try again'. if the os preempts a thread while 
spinning, it does not block the other threads.
...
Then, I started to wrote a test program to compare performance with Intel's
Thread Building Block library (3.0 update 7, the latest stable). I am far
from finished, but my first results are that boost::lockfree is slower on
my machine by about 40%.
I use the TBB concurrent_queue<T> class that is according to the doc
"non-blocking". I had a quick look at the implementation it seems to be
using a similar technique as fifo<T> (that is spinning, no system locks).
Do you know whether this is actually a lock free structure or not ?
i checked tbb's concurrent_queue several year ago, but at that time, it seemed 
to be using spinlocks. maybe this has changed. non-blocking does not necessarily 
mean lock-free, but it could be an obstruction-free queue ...

one point, where i really question the non-blocking claim is the signature of 
the push() function:

void push( const T& source );

it it either needs to allocate memory dynamically from the OS, where the memory 
allocator may block, or it throws an exception (and afaict this will also cause 
some memory allocations).
...
My question is not demanding that you match Intel's perf ;) but rather if
you have any ideas on what could be done to improve the performance further
(Intel's using assembly for instance, no idea if it has a big impact,
another suspect may be a better atomic implementation).
some people suggest that implementing a backoff could improve the performance. 
otoh, herlihy and shavit suggest that the performance of a backoff is heavily 
depends on the backoff parameters and these parameters differ among different 
cpu type and even clock speeds.

because of this, and because implementing a backoff will require some assembly 
code, i decided not to include a backoff in boost.lockfree.
...
I have attached the code FYI (please keep in mind that this is a quick
test). The interesting part is the "SimpleTest" functions.
I compiled on Ubuntu 11.04 / gcc 4.5 in C++0x mode.
I measured the time using the "time" command. I ran the sample on an AMD
cpu which cpuinfo are attached.
one point: the enqueue/push functions of boost.lockfree do not necessarily 
succeed. in your code, it should be fine, because the freelist that is used will 
allocate new internal nodes, if the freelist is empty. however this means that 
the push operation may block, if the allocator blocks.

cheers, tim

Re: [boost] [Review] Lockfree review: two more days left

Tim Blechmann