
hi holger, thanks for the review!
The library seems well-designed. As I believe has been discussed in a separate thread, naming is somewhat inconsistent. The failure modes are not clear to me (e.g. does enqueue/push return false or throw an exception when there is no more memory), but overall I think Tim did an excellent job.
have already improved the documentation of these functions, but i will check their wording again.
As mentioned, it would appear that bundled atomic implementation has a lot of shortcomings, but one is particularly bad: atomic<tagged_ptr> is not actually lock-free for Visual C++. The DCAS implementation should be straightforward, but seems to be absent . in case you wonder I did an initial implementation of <(cstd)atomic> and added all required intrinsics for a simple implementation to the compiler. These made it all into VC++2010 RTM (IIRC, there was only one 64-bit, most where 8 & 16-bit variants)
i don't have access to any msvc compiler, do you think you could contribute the msvc-specific code for boost.atomic? besides, i have a version that only requires 32bit cas.
There are a few points that make me a bit nervous. Specifically, the alignment story is a bit weak. It seems there is only one use of BOOST_LOCKFREE_CACHELINE_ALIGNMENT and that's for fifo<>::node. Older compilers (GCCs and VC at least) don't dynamically align the stack with __declspec(align)/__attribute__((aligned). Other classes do not use alignment at all.
i don't know of any standard-compliant c++ way to enforce a specific alignment. it doesn't affect the correct, but is merely a hint for the compiler
It seems somewhat inconsequent to add internal padding to move members to different cachelines, but at the same time not ensuring alignment (which means potential cacheline overlaps at back and front of the object). Also the code seems to assume that compilers will use an empty base class optimization for noncopyable -- I guess, that's fine for most major compilers.
is there any current compile, that doesn't optimize empty base classes? i read in a document from the early 1990s, that most recent compilers do an empty base class optimization.
Also it is not clear how it is ensured that nodes allocated are actually aligned. Allocators can and usually do ignore alignment requirements greater than intmax_t types. Some may just fail to instantiate.
the cache line alignment is again an optimization, that should work reasonably well. it shouldn't affect the correctness, though.
likely/unlikely are often defined as macros as in the Linux kernel. People probably shouldn't do that for C++, but I have seen it quite a bit. It probably doesn't help that many compilers' inlining oracles don't do very well for modern code.
why? often the hints are generated by profile-guided optimization, but very few build systems actually make use of them.
The name freelist_t is somewhat confusing for a template parameter and while I'm not sure what harm it could do it should probably follow the general convention of CamelCase.
already replaced with boost.parameter-style policies
The definition of wait-free and lock-free is a bit weird. Specifically, the term concurrent operations is a bit vague. I think it's easier to grasp if you say something along the lines of at least one thread makes forward progress at any given point in time. With the current definition in the docs (and most others I know of), it may not be immediately clear why a mutex-based implementation wouldn't be lock-free.
the wait-free and lock-free definitions are almost exactly taken from herlihy&shavit. i can try to explain them a bit more..
There's a paragraph:
The synchronization is done completely in user-space with no interaction without any direct interaction with the operating system [1] . This implies that they are not prone to issues like priority inversion (a low-priority thread needs to wait for a high-priority thread).
I think there's some duplication in the first sentence. I'm not sure what lock-free has to do with priority inversion and I would think that priority inversion means priorities are -- well, inverted. I.e. a high-priority thread is not scheduled, because a lower priority thread owns a resource needed by the high-priority thread.
well, but with lock-free programming, no thread owns a resource.
This also makes it sound a bit like lock-free is superior to OS mechanisms, but that really depends. A standard lock-free list is not fair for writers, but an implementation making use of OS facilities can be. On the other hand ticket lockets can provide some fairness.
it is definitely not the aim to provide a general-purpose concurrent data structure, which is fair and efficient, but to provide data structures that are, as the name states, lock-free. ;) the section about the performance of lock-free data structures already discusses some performance-related aspects, but apparently this is not clear enough.
fifo<> and ringbuffer::empty says it is not thread-safe. I would assume that means the value returned is potentially stale, but it can't break class invariants, or can it?
already improved this part. cheers, tim