
Dear All, Here is a quick review of the proposed Boost.Lockfree library. Utility ------- For some proposed libraries their utility is beyond doubt because they are already being used in some significant application. In contrast most GSoC projects are developed more in the "hope that they will be useful" without a motivating application. So we have to ask ourselves, does this library meet a need? To me, lock free algorithms and data structures have always seemed appealing at first sight but, apart from some trivial cases like counters, on close inspection they turn out to be too complicated to bother with. (They also lack required features, i.e. waiting for non-empty; I don't think I've ever written a producer-consumer application that didn't need the consumer to block on empty.) The author acknowledges that these structures will typically be slower than conventional implementations using mutexes, but claims that their benefit comes in terms of better worst-case latency. Specifically, if a thread is pre-empted while it holds a lock, another thread will have to wait until the first thread is re-scheduled and can unlock before it can itself lock. But this case is exactly the one that priority inversion is supposed to address: when the second thread tries to lock, the kernel immediately re-schedules the first thread. Yes, because of the kernel involvement this will probably still be slower than the lock-free approach, but perhaps not by much. It seems to me that the author should present benchmarks to demonstrate the actual worst-case latency difference between this code and a lock-based implementation. Design ------ I am happy with the design of the interface, as far as it goes; there are a few things that could be added though: - Debug modes that check the thread requirements. - Other combinations of single/multiple producer/consumer. Implementation -------------- I have looked only at the implementation of the ringbuffer class template. It lacks comments. There are some oddities like this: size_t ret = read_index - write_index - 1; if (write_index >= read_index) ret += max_size; return ret; Here, a negative value may be assigned to the unsigned size_t. I think this is actually safe, but it looks odd. I have not found any bugs. It looks like every call to enqueue() and dequeue() will access both the read and write indexes. Won't this cause poor performance due to cache issues? I would have been tempted to avoid this by keeping local 'available' counts: (PSEUDO_CODE) private: atomic<size_t> read_index_; atomic<size_t> write_index_; size_t read_available_; size_t write_available_; public: bool enqueue(...) { if (write_available_==0) { // In this case, we access read_index_ which is probably cached by the consumer thread, // which is expensive. But this is rare. write_available_ = (read_index_ + max_size - write_index_ - 1) % max_size; } if (!write_available_) return false; // full buffer[write_index_.load(memory_order_relaxed)] = data; write_index_.store((write_index_+1)%max_size); --write_available_; } // dequeue similarly... Documentation ------------- I'm not impressed by the documentation. It lacks rationale and information about the implementation. It has some formatting issues and typos (e.g. lose/loose). Platform requirements (i.e. the Boost.Atomic and/or std::atomic requirements and supported architectures) should be explained. The reference documentation involves more clicks than I would expect to reach the actual content. Some functions are marked as "for debugging only", e.g. the empty() and reset() [which should probably be called clear()] methods. It's not clear to me why these methods must be unsafe; for example, for the ringbuffer, I would expect the consumer thread to safely be able to call empty() and the producer to safely be able to call full() (which is not provided). Rationale for these restrictions (or relaxations of them) would be useful. - Did you try to use the library? With what compiler? Did you have any problems? I considered exercising the library on my ARM systems, but I realised that it requires 64-bit atomics; currently, Boost.Atomic only provides 32-bit atomics on ARM. (This raises the whole issue of the Boost.Atomic dependency. I'm presuming that Tim intends not to bundle a Boost.Atomic snapshot with the library if approved. If that changes, we really need another review.) - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? A few hours reading the docs and code. - Are you knowledgeable about the problem domain? I know a certain amount about atomics and lock-based concurrent programming, but not much about lock-free data structures. - Please always state in your review, whether you think the library should be accepted as a Boost library! I think it should be accepted provided that: * The author demonstrates some measurable benefit over lock-based data structures. * The documentation is improved to describe the implementation and to provide rationale for design decisions. * The library is not added to Boost until Boost.Atomic has been reviewed and approved (or, available std::atomic implementations work with it). Regards, Phil.