[Review] Lockfree review: two more days left

Hi all, We have two more days left for the review of Tim Blechmann's Lockfree library. If you're interested in this library, please consider contributing a review! Please see the original (slightly corrected) announcement below: ----------------------------------------------------- About the library: Boost.Lockfree provides implementations of lock-free data structures. Lock-free data structures can be accessed by multiple threads without the necessity of blocking synchronization primitives such as guards. Lock-free data structures can be used in real-time systems, where blocking algorithms may lead to high worst-case execution times, to avoid priority inversion, or to increase the scalability for multi-processor machines. The following data structures are provided: - boost::lockfree::fifo, a lock-free fifo queue - boost::lockfree::stack, a lock-free stack - boost::lockfree::ringbuffer, a wait-free single-producer/single-consumer ringbuffer. The library is accessible from here: http://tim.klingt.org/boost_lockfree.tar.gz. Boost.Lockfree depends on C++0x atomics. They are not well supported with current compilers yet. For this reason Boost.Lockfree depends on the *unreviewed* Boost.Atomic library, that emulates C++0x atomics for C++98. This review is about Boost.Lockfree and not about Boost.Atomic, although it is included in the tarball and the git repository. If Boost.Lockfree will be accepted, it won't be merged into trunk before Boost.Atomic is accepted. --------------------------------------------------- Please always state in your review, whether you think the library should be accepted as a Boost library! Additionally please consider giving feedback on the following general topics: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Regards Hartmut Review Manager

Hi, First, I intend to contribute a review, even though not a fully one (and hopefully it will be ready on time ;). My knowledge on the theoretical aspects is too limited to judge the actual implementation, but there a few simple points that I would like to check about performance. The rest of the e-mail is mainly addressed to Tim. First, I have re read the doc and started to look at the implementation and I found something that is apparently a contradiction: - In the doc you say "Spinlocks do not directly interact with the operating system either. However it is possible that the owning thread is preempted by the operating system, which violates the lock-free property" - In the code however (for fifo<T>), the code seems to be "spinning" all the time using "for (;;)" loops. Could you give me some precisions on why this is ok ? Then, I started to wrote a test program to compare performance with Intel's Thread Building Block library (3.0 update 7, the latest stable). I am far from finished, but my first results are that boost::lockfree is slower on my machine by about 40%. I use the TBB concurrent_queue<T> class that is according to the doc "non-blocking". I had a quick look at the implementation it seems to be using a similar technique as fifo<T> (that is spinning, no system locks). Do you know whether this is actually a lock free structure or not ? My question is not demanding that you match Intel's perf ;) but rather if you have any ideas on what could be done to improve the performance further (Intel's using assembly for instance, no idea if it has a big impact, another suspect may be a better atomic implementation). I have attached the code FYI (please keep in mind that this is a quick test). The interesting part is the "SimpleTest" functions. I compiled on Ubuntu 11.04 / gcc 4.5 in C++0x mode. I measured the time using the "time" command. I ran the sample on an AMD cpu which cpuinfo are attached. I'll perform some other tests and I'll keep you updated. Thanks again for your efforts, Julien

hi julien,
First, I have re read the doc and started to look at the implementation and I found something that is apparently a contradiction:
- In the doc you say "Spinlocks do not directly interact with the operating system either. However it is possible that the owning thread is preempted by the operating system, which violates the lock-free property"
- In the code however (for fifo<T>), the code seems to be "spinning" all the time using "for (;;)" loops.
Could you give me some precisions on why this is ok ?
spinning is not the problem, locking is. the spinning in fifo (aka queue) and stack basically has the semantics `try an operation, if it fails because another thread interferes with it, try again'. if the os preempts a thread while spinning, it does not block the other threads.
Then, I started to wrote a test program to compare performance with Intel's Thread Building Block library (3.0 update 7, the latest stable). I am far from finished, but my first results are that boost::lockfree is slower on my machine by about 40%.
I use the TBB concurrent_queue<T> class that is according to the doc "non-blocking". I had a quick look at the implementation it seems to be using a similar technique as fifo<T> (that is spinning, no system locks). Do you know whether this is actually a lock free structure or not ?
i checked tbb's concurrent_queue several year ago, but at that time, it seemed to be using spinlocks. maybe this has changed. non-blocking does not necessarily mean lock-free, but it could be an obstruction-free queue ... one point, where i really question the non-blocking claim is the signature of the push() function: void push( const T& source ); it it either needs to allocate memory dynamically from the OS, where the memory allocator may block, or it throws an exception (and afaict this will also cause some memory allocations).
My question is not demanding that you match Intel's perf ;) but rather if you have any ideas on what could be done to improve the performance further (Intel's using assembly for instance, no idea if it has a big impact, another suspect may be a better atomic implementation).
some people suggest that implementing a backoff could improve the performance. otoh, herlihy and shavit suggest that the performance of a backoff is heavily depends on the backoff parameters and these parameters differ among different cpu type and even clock speeds. because of this, and because implementing a backoff will require some assembly code, i decided not to include a backoff in boost.lockfree.
I have attached the code FYI (please keep in mind that this is a quick test). The interesting part is the "SimpleTest" functions. I compiled on Ubuntu 11.04 / gcc 4.5 in C++0x mode. I measured the time using the "time" command. I ran the sample on an AMD cpu which cpuinfo are attached.
one point: the enqueue/push functions of boost.lockfree do not necessarily succeed. in your code, it should be fine, because the freelist that is used will allocate new internal nodes, if the freelist is empty. however this means that the push operation may block, if the allocator blocks. cheers, tim
participants (3)
-
Hartmut Kaiser
-
Julien Nitard
-
Tim Blechmann