Re: [boost] [lockfree] Review

30 Jul 2011

      hi holger,

thanks for the review!
...
The library seems well-designed. As I believe has been discussed in a
separate thread, naming is somewhat inconsistent. The failure modes are
not clear to me (e.g. does enqueue/push return false or throw an exception
when there is no more memory), but overall I think Tim did an excellent
job.
have already improved the documentation of these functions, but i will check 
their wording again.
...
As mentioned, it would appear that bundled atomic implementation has a lot
of shortcomings, but one is particularly bad: atomic<tagged_ptr> is not
actually lock-free for Visual C++. The DCAS implementation should be
straightforward, but seems to be absent . in case you wonder I did an
initial implementation of <(cstd)atomic> and added all required intrinsics
for a simple implementation to the compiler. These made it all into
VC++2010 RTM (IIRC, there was only one 64-bit, most where 8 & 16-bit
variants)
i don't have access to any msvc compiler, do you think you could contribute the 
msvc-specific code for boost.atomic?
besides, i have a version that only requires 32bit cas.
...
There are a few points that make me a bit nervous. Specifically, the
alignment story is a bit weak. It seems there is only one use of
BOOST_LOCKFREE_CACHELINE_ALIGNMENT and that's for fifo<>::node. Older
compilers (GCCs and VC at least) don't dynamically align the stack with
__declspec(align)/__attribute__((aligned). Other classes do not use
alignment at all.
i don't know of any standard-compliant c++ way to enforce a specific alignment. 
it doesn't affect the correct, but is merely a hint for the compiler
...
It seems somewhat inconsequent to add internal padding
to move members to different cachelines, but at the same time not ensuring
alignment (which means potential cacheline overlaps at back and front of
the object). Also the code seems to assume that compilers will use an
empty base class optimization for noncopyable -- I guess, that's fine for
most major compilers.
is there any current compile, that doesn't optimize empty base classes? i read 
in a document from the early 1990s, that most recent compilers do an empty base 
class optimization.
...
Also it is not clear how it is ensured that nodes allocated are actually
aligned. Allocators can and usually do ignore alignment requirements
greater than intmax_t types. Some may just fail to instantiate.
the cache line alignment is again an optimization, that should work reasonably 
well. it shouldn't affect the correctness, though.
...
likely/unlikely are often defined as macros as in the Linux kernel. People
probably shouldn't do that for C++, but I have seen it quite a bit. It
probably doesn't help that many compilers' inlining oracles don't do very
well for modern code.
why? often the hints are generated by profile-guided optimization, but very few 
build systems actually make use of them.
...
The name freelist_t is somewhat confusing for a template parameter and
while I'm not sure what harm it could do it should probably follow the
general convention of CamelCase.
already replaced with boost.parameter-style policies
...
The definition of wait-free and lock-free is a bit weird. Specifically, the
term concurrent operations is a bit vague. I think it's easier to grasp if
you say something along the lines of at least one thread makes forward
progress at any given point in time. With the current definition in the
docs (and most others I know of), it may not be immediately clear why a
mutex-based implementation wouldn't be lock-free.
the wait-free and lock-free definitions are almost exactly taken from 
herlihy&shavit. i can try to explain them a bit more..
...
There's a paragraph:
...
The synchronization is done completely in user-space with no interaction
without any direct interaction with the operating system [1] . This
implies that they are not prone to issues like priority inversion (a
low-priority thread needs to wait for a high-priority thread).
I think there's some duplication in the first sentence. I'm not sure what
lock-free has to do with priority inversion and I would think that
priority inversion means priorities are -- well, inverted. I.e. a
high-priority thread is not scheduled, because a lower priority thread
owns a resource needed by the high-priority thread.
well, but with lock-free programming, no thread owns a resource.
...
This also makes it sound a bit like lock-free is superior to OS mechanisms,
but that really depends. A standard lock-free list is not fair for
writers, but an implementation making use of OS facilities can be. On the
other hand ticket lockets can provide some fairness.
it is definitely not the aim to provide a general-purpose concurrent data 
structure, which is fair and efficient, but to provide data structures that are, 
as the name states, lock-free. ;)

the section about the performance of lock-free data structures already discusses 
some performance-related aspects, but apparently this is not clear enough.
...
fifo<> and ringbuffer::empty says it is not thread-safe. I would assume
that means the value returned is potentially stale, but it can't break
class invariants, or can it?
already improved this part.

cheers, tim

Re: [boost] [lockfree] Review

Tim Blechmann