[Review] Lockfree review: today (July 28th) is the last day

All, Today is the last day of the review of Tim Blechmann's Lockfree library. If you're interested in this library, please consider contributing a review! If you need more time, please contact me privately. Please see the original (slightly corrected) announcement below: ----------------------------------------------------- About the library: Boost.Lockfree provides implementations of lock-free data structures. Lock-free data structures can be accessed by multiple threads without the necessity of blocking synchronization primitives such as guards. Lock-free data structures can be used in real-time systems, where blocking algorithms may lead to high worst-case execution times, to avoid priority inversion, or to increase the scalability for multi-processor machines. The following data structures are provided: - boost::lockfree::fifo, a lock-free fifo queue - boost::lockfree::stack, a lock-free stack - boost::lockfree::ringbuffer, a wait-free single-producer/single-consumer ringbuffer. The library is accessible from here: http://tim.klingt.org/boost_lockfree.tar.gz. Boost.Lockfree depends on C++0x atomics. They are not well supported with current compilers yet. For this reason Boost.Lockfree depends on the *unreviewed* Boost.Atomic library, that emulates C++0x atomics for C++98. This review is about Boost.Lockfree and not about Boost.Atomic, although it is included in the tarball and the git repository. If Boost.Lockfree will be accepted, it won't be merged into trunk before Boost.Atomic is accepted. --------------------------------------------------- Please always state in your review, whether you think the library should be accepted as a Boost library! Additionally please consider giving feedback on the following general topics: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Regards Hartmut Review Manager

I have used boost::lockfree in an early version of my https://github.com/bytemaster/Boost.CMT library because I needed a lock-free producer-consumer queue for sending tasks to my scheduler. So this review comes with some "real world" use. *What is your evaluation of the design?* The API is relatively straight forward and simple to use. It gives the user flexibility in how internals operate. *What is your evaluation of the implementation?* In my testing I ultimately removed boost::lockfree::fifo's from my code because they were the performance bottleneck. Don't use lock free fifo's for short lived 'wait queues' such as a shared future<> object or any application that may have 'surges' of demand that allocate large amounts of memory that will never be freed until the fifo is destroyed. Several use-cases were ignored which have much more efficient lock free implementations than the general case. - multiple producer, single consumer. - single producer, single consumer - multiple producer, multiple pop-all - unordered producer-consumer queues Ultimately I used an intrusive fifo where the elements (tasks) contained pointers to the next task (if any). I then atomically updated the head pointer to insert new nodes. To pop nodes I atomically swap the head node to NULL then process all the nodes in the list from tail to head to maintain fifo order. In other words, if your queue contains polymorphic elements then you are already calling new once and calling new a second time to create a node containing a pointer a lot of unnecessary overhead and blocking. Furthermore, accessing the 'free list' is not free. *What is your evaluation of the documentation?* The documentation is sufficient and easy to understand. *What is your evaluation of the potential usefulness of the library?* This library obviously has potential usefulness as I needed lock free data structures for my scheduler; however, the library, as implemented, was not useful in my application. I suspect that there are places where it could be useful, as implemented, as an alternative to using locks around stl containers, but fear people will expect more from being 'lock free' than this implementation can provide considering the heap is the biggest source of lock contention in most multi-threaded applications. This library would be much more useful if it provided a lock free linked list and priority queue. *Did you try to use the library? With what compiler? Did you have any problems? * I tried to use the library on Mac OS X using gcc 4.2 and 4.5. My problems were mostly related to excessive overhead related to construction/destruction times and cached free-list implementation. *How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?* Attempted to use in real life, profiled code using lock free fifos, determined causes of performance problems, and ultimately spun my own solution. All in all probably 8 to 16 hours of development using lock free. *Are you knowledgeable about the problem domain?* Knowledgeable enough to roll my own lock-free solutions, but not knowledgeable enough to be aware of issues like 'false sharing' and other concerns that were obviously considered in the design and implementation of boost::lockfree. *Do you think the library should be accepted as a Boost library?* * * Despite the problems I had with it, I see no reason not to accept it as a good start toward what I hope will be come a larger collection of lock-free containers. * * * * On Thu, Jul 28, 2011 at 8:41 AM, Hartmut Kaiser <hartmut.kaiser@gmail.com>wrote:
All,
Today is the last day of the review of Tim Blechmann's Lockfree library. If you're interested in this library, please consider contributing a review! If you need more time, please contact me privately.
Please see the original (slightly corrected) announcement below:
-----------------------------------------------------
About the library:
Boost.Lockfree provides implementations of lock-free data structures. Lock-free data structures can be accessed by multiple threads without the necessity of blocking synchronization primitives such as guards. Lock-free data structures can be used in real-time systems, where blocking algorithms may lead to high worst-case execution times, to avoid priority inversion, or to increase the scalability for multi-processor machines.
The following data structures are provided:
- boost::lockfree::fifo, a lock-free fifo queue - boost::lockfree::stack, a lock-free stack - boost::lockfree::ringbuffer, a wait-free single-producer/single-consumer ringbuffer.
The library is accessible from here: http://tim.klingt.org/boost_lockfree.tar.gz.
Boost.Lockfree depends on C++0x atomics. They are not well supported with current compilers yet. For this reason Boost.Lockfree depends on the *unreviewed* Boost.Atomic library, that emulates C++0x atomics for C++98. This review is about Boost.Lockfree and not about Boost.Atomic, although it is included in the tarball and the git repository.
If Boost.Lockfree will be accepted, it won't be merged into trunk before Boost.Atomic is accepted.
---------------------------------------------------
Please always state in your review, whether you think the library should be accepted as a Boost library!
Additionally please consider giving feedback on the following general topics:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Regards Hartmut Review Manager
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi *What is your evaluation of the implementation?*
In my testing I ultimately removed boost::lockfree::fifo's from my code because they were the performance bottleneck. Don't use lock free fifo's for short lived 'wait queues' such as a shared future<> object or any application that may have 'surges' of demand that allocate large amounts of memory that will never be freed until the fifo is destroyed.
I am interested in knowing which profiling tool you used to that effect, if any. Regards, Julien

hi daniel, thanks for your review!
*What is your evaluation of the implementation?* In my testing I ultimately removed boost::lockfree::fifo's from my code because they were the performance bottleneck. Don't use lock free fifo's for short lived 'wait queues' such as a shared future<> object or any application that may have 'surges' of demand that allocate large amounts of memory that will never be freed until the fifo is destroyed.
i've never argued that boost.lockfree has a high performance (as in throughput), but my main concern is lock-freedom. in many cases, lock-based data structures can outperform lock-free ones (because atomic operations are usually pretty expensive). but in some use cases (e.g. in real-time systems) one wants to use data structures that are slow but are guaranteed to be lock-free. as for the problem of memory reclamation: this is not a trivial issue and unfortunately the published algorithms (hazard pointers and pass-the-buck) are patented.
Several use-cases were ignored which have much more efficient lock free implementations than the general case. - multiple producer, single consumer. - single producer, single consumer - multiple producer, multiple pop-all - unordered producer-consumer queues
it actually contains a spsc queue (originally named `ringbuffer', but i am planing to rename it to `spsc_queue').
Ultimately I used an intrusive fifo where the elements (tasks) contained pointers to the next task (if any). I then atomically updated the head pointer to insert new nodes. To pop nodes I atomically swap the head node to NULL then process all the nodes in the list from tail to head to maintain fifo order.
i'd be curious to see the code for this, is it available?
Despite the problems I had with it, I see no reason not to accept it as a good start toward what I hope will be come a larger collection of lock-free containers.
i'd be happy to see contributions to the library ;) cheers, tim

Hi All, Please find my review of boost::lockfree below. I think that boost.lockfree should be accepted in boost, at least its "ringbuffer" data structure offers really interesting performance (even though I understand that pure raw performance as in throughput / latency is not the main goal). *- What is your evaluation of the design?* Usage of boost.lockfree is simple and elegant. I'd like to see some name changed, if I may express my vote I'd say circular_buffer, queue and stack in the boost::lockfree namespace. I was thinking about adding an unsafe "view" of the structures that would allow normal container operations while the main interface would only have "safe" functions. I support the proposition to make boost::lockfree work with boost::interprocess. *- What is your evaluation of the implementation?* My evaluation of the perfomance is not based only on the code review, but mainly on actual tests for the use cases I am interested in professionally (sadly not all of them yet). The coding style as the quality you would expect from a boost library. The nature of the algorithms make it hard to read though. My performance measurements can be found in the file attached (code and CSV output). I focused on throughput tests, I didn't have time for latency yet. Basically "ringbuffer" is _very_ fast. It crushes Intel's TBB by a factor 10 on single producer single consumer scenario. "fifo" will be more useful in real-time software because it's performance is not as scalable as I was expecting: on a 4 core CPU it took more time to deal with the same amount of data with 4 threads than with 2 (half producers half consumers). In contrast, TBB took the same amount of time (and I guess that ideally it should take less time with more threads as long as there are enough cores for each thread to run). This may be attenuated by the fact that I did not simulate a workload between each synchronized operation as I did for the SCSP scenario. Some have emitted fears that the implementation may have bugs because it's hard to prove concurrent algorithms. I am pretty sure that ringbuffer and fifo are bug free as far as "int" is used as the data type. *- What is your evaluation of the documentation?* * * It's not verbose enough. There's a great deal of work in the code, but as it has been stated many times, this is a delicate topic and more help is always welcome. * * * - What is your evaluation of the potential usefulness of the library?* Not developing real time solutions myself, so I can't fully judge. On the other hand, a wait free buffer is quite likely to be a very welcomed solution for many non real time high performance applications (starting by IPC and boost::interprocess ;). *- Did you try to use the library? With what compiler? Did you have any **problems?* * * I used the library to write a simple series of test. I used GCC 4.5.2 / Ubuntu in C++0x mode. I didn't have any problem. * - How much effort did you put into your evaluation? A glance? A quick **reading? In-depth study?* * * I spent about 6/7 hours reading the docs and writing my tests. * * * - Are you knowledgeable about the problem domain?* I am not knowledgeable about lock free data structures and algorithm but I write concurrent applications for breakfast. About the test code : - It IS test code - You'll need to install and compile TBB and edit the Makefile for your local paths. Regards, Julien

hi julien, thanks for your review. few comments!
*- What is your evaluation of the design?*
Usage of boost.lockfree is simple and elegant. I'd like to see some name changed, if I may express my vote I'd say circular_buffer, queue and stack in the boost::lockfree namespace. I was thinking about adding an unsafe "view" of the structures that would allow normal container operations while the main interface would only have "safe" functions. I support the proposition to make boost::lockfree work with boost::interprocess.
after some earlier discussion, i think the best names are queue, stack and spsc_queue (to make the single-produce/single-consumer safety as explicit as possible) also enqeueue/dequeue will be renamed to push/pop so that all data structures share the same interface. (std::queue also uses the names push/pop, so it is probably the best way to go).
"fifo" will be more useful in real-time software because it's performance is not as scalable as I was expecting: on a 4 core CPU it took more time to deal with the same amount of data with 4 threads than with 2 (half producers half consumers).
the reason for this is that different threads interfere with each other. i know of this problem, but also i am not sure if this is a real problem for cases other than benchmarks. a benchmark will stress the data structures in a rather unnatural way, but in a real application one probably won't pipe millions of integers through a queue, but probably the produce threads will spend a lot of time generating them and the consumer a lot of time processing them ;)
It's not verbose enough. There's a great deal of work in the code, but as it has been stated many times, this is a delicate topic and more help is always welcome. * * * - What is your evaluation of the potential usefulness of the library?*
Not developing real time solutions myself, so I can't fully judge. On the other hand, a wait free buffer is quite likely to be a very welcomed solution for many non real time high performance applications (starting by IPC and boost::interprocess ;).
i've spent some time to analyse if using it via boost.interprocess will be reasonable/feasible. the answer is: it depends on the CPU and the implementation of atomic<>. if the CPU is supported by boost.atomic, then it is no problem. if the standard library provides atomic<>, it shouldn't be a problem, either. however if boost.atomic has to emulate the atomic operations, it will use a pool of spinlocks (the same spinlock pool that is used for the smart_ptr library). afaict there spinlocks won't be shared among the different proceses ... unfortunately there is no way to catch this at compile-time :/ cheers, tim

UNCLASSIFIED
- What is your evaluation of the potential usefulness of the library?*
Not developing real time solutions myself, so I can't fully judge. On the other hand, a wait free buffer is quite likely to be a very welcomed solution for many non real time high performance applications (starting by IPC and boost::interprocess ;).
i've spent some time to analyse if using it via boost.interprocess will be reasonable/feasible. the answer is: it depends on the CPU and the implementation of atomic<>. if the CPU is supported by boost.atomic, then it is no problem. if the standard library provides atomic<>, it shouldn't be a problem, either. however if boost.atomic has to emulate the atomic operations, it will use a pool of spinlocks (the same spinlock pool that is used for the smart_ptr library). afaict there spinlocks won't be shared among the different proceses ...
That's really good news, Tim, that principally it would not be a problem as long as the atomic<> can work with boost::interprocess. That point about atomic<> is quite critical and I would strongly appeal to the author of the "boost::atomic" (Helge Bahmann) to also revisit the atomic library to make, for instance, the type of spinlocks pulled out as a template parameter with maybe defaulting to the ordinary spinlocks. That way if you need to use atomic in boost::interprocess compatible fashion (or anyone else in other applications for that matter) you can plug in a different/custom spinlock. Haven't looked into the atomic's implementation and maybe it's already done that way but if not, Tim, you might bring this need up with Helge. Is that feasible? It's worth noting that Ion Gaztañaga (the author of boost::interprocess) has submitted a new generic Containers library for review which provides among other kinds also alternative implementations (AFAIK) for STL containers which have been suffering from the issue of not pulling out the pointer type typedefs from the supplied allocators, hence they've assumed that "pointer" means kinda "raw pointer". So what I'm trying to say here is that there is a good progress made in the land of boost's other parts' compatibility with interprocess library, so it would be real missed opportunity to rush with this lib release and omit the interprocess support. (sorry for pushing the point repeatedly :-\) I hope it make sense to youse all.
unfortunately there is no way to catch this at compile-time :/
Not easily, but things like that could be perhaps addressed in later updates. I'm really looking forward to this lib and the atomic<> - the long missing component in C++ libs. Thanks very much gabe Gabe Levy Senior Software Engineer, HiQ Systems Pty Ltd IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.

I'm travelling this week, but still hope to take another look at the implementation - I promised I would. Probably some night when I'm stuck in the hotel room. Quick overall - I think it should be accepted. What I've seen so far (during boostcon) looked good. I suspect at most all I'm going to do is find some small bug - that doesn't mean it shouldn't be accepted, it would just mean that the bug should be fixed. Tony On Thu, Jul 28, 2011 at 8:41 AM, Hartmut Kaiser <hartmut.kaiser@gmail.com> wrote:
All,
Today is the last day of the review of Tim Blechmann's Lockfree library. If you're interested in this library, please consider contributing a review! If you need more time, please contact me privately.
Please see the original (slightly corrected) announcement below:
-----------------------------------------------------
About the library:
Boost.Lockfree provides implementations of lock-free data structures. Lock-free data structures can be accessed by multiple threads without the necessity of blocking synchronization primitives such as guards. Lock-free data structures can be used in real-time systems, where blocking algorithms may lead to high worst-case execution times, to avoid priority inversion, or to increase the scalability for multi-processor machines.
The following data structures are provided:
- boost::lockfree::fifo, a lock-free fifo queue - boost::lockfree::stack, a lock-free stack - boost::lockfree::ringbuffer, a wait-free single-producer/single-consumer ringbuffer.
The library is accessible from here: http://tim.klingt.org/boost_lockfree.tar.gz.
Boost.Lockfree depends on C++0x atomics. They are not well supported with current compilers yet. For this reason Boost.Lockfree depends on the *unreviewed* Boost.Atomic library, that emulates C++0x atomics for C++98. This review is about Boost.Lockfree and not about Boost.Atomic, although it is included in the tarball and the git repository.
If Boost.Lockfree will be accepted, it won't be merged into trunk before Boost.Atomic is accepted.
---------------------------------------------------
Please always state in your review, whether you think the library should be accepted as a Boost library!
Additionally please consider giving feedback on the following general topics:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Regards Hartmut Review Manager
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (6)
-
Daniel Larimer
-
Gottlob Frege
-
Hartmut Kaiser
-
Julien Nitard
-
Levy, Gabriel (Contractor)
-
Tim Blechmann