
hi all, i've did some updates to boost.lockfree the biggest difference to earlier snapshots is, that it includes a fixed-size single-producer/single-consumer ringbuffer. besides, the documentation is based on doxygen now. snapshots are available from [1], docs from [2] and my git repository can be found at [3] and cloned from [4]... cheers, tim [1] http://tim.klingt.org/code/projects/boost-lockfree/files [2] http://tim.klingt.org/boost_lockfree/ [3] http://tim.klingt.org/git?p=boost_lockfree.git [4] git://tim.klingt.org/boost_lockfree.git -- tim@klingt.org http://tim.klingt.org The first question I ask myself when something doesn't seem to be beautiful is why do I think it's not beautiful. And very shortly you discover that there is no reason. John Cage.

On 23 July 2010 06:24, Tim Blechmann <tim@klingt.org> wrote:
hi all,
i've did some updates to boost.lockfree the biggest difference to earlier snapshots is, that it includes a fixed-size single-producer/single-consumer ringbuffer.
besides, the documentation is based on doxygen now. snapshots are available from [1], docs from [2] and my git repository can be found at [3] and cloned from [4]...
Hi, This looks like a really useful addition to boost, hope you'll push it for review at some point. Some quick feedback on the fifo class Fifo: "The fifo class provides a multi-writer/multi-reader fifo, enqueueing and dequeueing is lockfree, construction/destruction has to be synchronized" What does it mean that construction/destruction has to be synchronized? Both these functions can only be called once. fifo(void): pool(128) { initialize(); } I would prefer having the default constructor to not allocate any memory at all. Why not expose a reserve() member function instead? bool enqueue(T const & t); Shouldn't this function throw an exception if t cannot be enqueued? Can the name be changed to push_back() instead, to mimic other std::containers? One may then, for instance, use your class with std::back_inserter Suggestion void push_back(const T&) // Throws... bool dequeue (T * ret) I think T should be taken by reference instead of pointer: Suggestion: bool pop_front(T&) or try_pop_front(T&) bool empty(void) //warning Not thread-safe I think this function should be removed from public interface if it cannot be implemented in a thread safe manner. Cheers, Christian

hi christian,
This looks like a really useful addition to boost, hope you'll push it for review at some point.
it has been on the review queue for a while, but with the current review speed, it may take some time :(
Some quick feedback on the fifo class
Fifo: "The fifo class provides a multi-writer/multi-reader fifo, enqueueing and dequeueing is lockfree, construction/destruction has to be synchronized"
What does it mean that construction/destruction has to be synchronized? Both these functions can only be called once.
one shouldn't try to en/dequeue from one thread, while another thread is running the constructor/destuctor. maybe it should formulated in a better way ...
fifo(void): pool(128) { initialize(); }
I would prefer having the default constructor to not allocate any memory at all. Why not expose a reserve() member function instead?
good point ...
bool enqueue(T const & t); Shouldn't this function throw an exception if t cannot be enqueued?
i would prefer not to throw an exception! if a constant-size freelist is used, the enqueue operation may fail, although no `error' occurred. i prefer this interface
Can the name be changed to push_back() instead, to mimic other std::containers? One may then, for instance, use your class with std::back_inserter
Suggestion void push_back(const T&) // Throws...
bool dequeue (T * ret) I think T should be taken by reference instead of pointer: Suggestion: bool pop_front(T&) or try_pop_front(T&)
hm ... enqueue/dequeue doesn't define a front or back ... not sure, whether one enqueues at the back and dequeues the back ... maybe simple push/pop is also fine ... i am open for suggestion about this ... taking a reference as argument is a good point!
bool empty(void) //warning Not thread-safe I think this function should be removed from public interface if it cannot be implemented in a thread safe manner.
i would prefer to keep it, but warn about the thread safety. i am also thinking about adding thread-unsafe version of enqueue/dequeue/push/pop, since they can be implemented in a more efficient way than the thread-safe versions. one can think of a use case, that one thread fills a fifo/queue and then notifies consumer threads ... tim -- tim@klingt.org http://tim.klingt.org Im übrigen ist es gescheiter, sich warm zuzudecken als sich zu betrinken. Werner Schwab

On 23 July 2010 14:08, Tim Blechmann <tim@klingt.org> wrote:
hi christian,
Hi Tim,
Fifo: "The fifo class provides a multi-writer/multi-reader fifo, enqueueing and dequeueing is lockfree, construction/destruction has to be synchronized"
What does it mean that construction/destruction has to be synchronized? Both these functions can only be called once.
one shouldn't try to en/dequeue from one thread, while another thread is running the constructor/destuctor. maybe it should formulated in a better way ...
Operating on a deleted object is always undefined behaviour. I think it's not related to lockfree datastructures.
bool enqueue(T const & t); Shouldn't this function throw an exception if t cannot be enqueued?
i would prefer not to throw an exception! if a constant-size freelist is used, the enqueue operation may fail, although no `error' occurred. i prefer this interface
If memory for the operation cannot be obtained, I believe common practice is to throw an exception. IMO, it does not matter if a fixed size pool is exhausted or the global heap, since the effect is the same -> a no op. With current interface I'd need to always check return value for every enqueue for out of memory conditions, something I don't need to do with any other container.
bool empty(void) //warning Not thread-safe I think this function should be removed from public interface if it cannot be implemented in a thread safe manner.
i would prefer to keep it, but warn about the thread safety. i am also thinking about adding thread-unsafe version of enqueue/dequeue/push/pop, since they can be implemented in a more efficient way than the thread-safe versions. one can think of a use case, that one thread fills a fifo/queue and then notifies consumer threads ...
Would if suffice to add another constructor for that? template<class Iterator> fifo(Iterator first, Iterator last)
Then you can have an optimized way of populating the lockfree container,without adding unsafe member functions. Something like this comes to mind: vector<int> values; shared_ptr<fifo<int>> work(new fifo<int>(values.begin(), values.end())); for(int i = 0; i < N_THREADS; ++i) start_job(work); / Christian

bool enqueue(T const & t); Shouldn't this function throw an exception if t cannot be enqueued?
i would prefer not to throw an exception! if a constant-size freelist is used, the enqueue operation may fail, although no `error' occurred. i prefer this interface
If memory for the operation cannot be obtained, I believe common practice is to throw an exception. IMO, it does not matter if a fixed size pool is exhausted or the global heap, since the effect is the same -> a no op. With current interface I'd need to always check return value for every enqueue for out of memory conditions, something I don't need to do with any other container.
The following doesn't look correct to me: If pool.allocate() is not supposed to throw an exception, you're constructing a node on a null pointer. node * alloc_node(void) { node * chunk = pool.allocate(); new(chunk) node(); <---------- chunk may be 0? return chunk; } Also, the code requires T to be default constructible. I think something like this would work: node* alloc_node(T const& v) { node * chunk = pool.allocate(); if(chunk) { new(chunk) node(v); } return chunk; } / christian

AMDG Christian Holmquist wrote:
The following doesn't look correct to me: If pool.allocate() is not supposed to throw an exception, you're constructing a node on a null pointer.
node * alloc_node(void) { node * chunk = pool.allocate(); new(chunk) node(); <---------- chunk may be 0? return chunk; }
It's okay if chunk is zero. Placement new is declared with throw(), so according to (5.3.4/13) it indicates failure by returning a null pointer, and the constructor will not be called. In Christ, Steven Watanabe

one shouldn't try to en/dequeue from one thread, while another thread is running the constructor/destuctor. maybe it should formulated in a better way ...
Operating on a deleted object is always undefined behaviour. I think it's not related to lockfree datastructures.
true
bool enqueue(T const & t); Shouldn't this function throw an exception if t cannot be enqueued?
i would prefer not to throw an exception! if a constant-size freelist is used, the enqueue operation may fail, although no `error' occurred. i prefer this interface
If memory for the operation cannot be obtained, I believe common practice is to throw an exception. IMO, it does not matter if a fixed size pool is exhausted or the global heap, since the effect is the same -> a no op. With current interface I'd need to always check return value for every enqueue for out of memory conditions, something I don't need to do with any other container.
true ... but if you use the fixed-size memory pool, you would have to add a try/catch block ...
bool empty(void) //warning Not thread-safe I think this function should be removed from public interface if it cannot be implemented in a thread safe manner.
i would prefer to keep it, but warn about the thread safety. i am also thinking about adding thread-unsafe version of enqueue/dequeue/push/pop, since they can be implemented in a more efficient way than the thread-safe versions. one can think of a use case, that one thread fills a fifo/queue and then notifies consumer threads ...
Would if suffice to add another constructor for that? template<class Iterator> fifo(Iterator first, Iterator last)
Then you can have an optimized way of populating the lockfree container,without adding unsafe member functions.
for my most important use case, this would not be sufficient. during some times, i am accessing my fifos from a single thread only, during other times by multiple threads ... cheers, tim -- tim@klingt.org http://tim.klingt.org Silence is only frightening to people who are compulsively verbalizing. William S. Burroughs

bool enqueue(T const & t); Shouldn't this function throw an exception if t cannot be enqueued?
i would prefer not to throw an exception! if a constant-size freelist is used, the enqueue operation may fail, although no `error' occurred. i prefer this interface
If memory for the operation cannot be obtained, I believe common practice is to throw an exception. IMO, it does not matter if a fixed size pool is exhausted or the global heap, since the effect is the same -> a no op. With current interface I'd need to always check return value for every enqueue for out of memory conditions, something I don't need to do with any other container.
true ... but if you use the fixed-size memory pool, you would have to add a try/catch block ...
If I can do something about the error, then yes, I would -always- need to add a try/catch block, since the global heap may be exhausted just as the fixed-size memory pool can. What I am trying to say, is that I'd prefer the same semantic of enqueue(T const&) as any std::container::insert(T const&). std::vector<int> vec; vec.push_back(1); // may throw fifo<int> fifo; fifo.enqueue(1); // may throw With current interface, I'd need to do something like if(!fifo.enqueue(1)) { throw std::bad_alloc(); // Or something } I think the semantic's should be equal to that of std::containers, when possible. How about a compromise: void enqueue(const T&) throw(...) void try_enqueue(const T&) // no throw guarantee if T's copy constructor is no throw guarantee For reference, it might be a good idea to have a look at Intel's TBB concurrent_queue/vector and see what design decisions they've made. I've no practical use of TBB unfortunately, but I've studied it a bit in order to make a recommendation for my employer. Reference is here: TBB Reference<http://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20Documentation/Reference.pdf>
Would if suffice to add another constructor for that?
template<class Iterator> fifo(Iterator first, Iterator last)
Then you can have an optimized way of populating the lockfree container,without adding unsafe member functions.
for my most important use case, this would not be sufficient. during some times, i am accessing my fifos from a single thread only, during other times by multiple threads ...
It would be interesting if you could add some performance benchmarks between lockfree vs non-lockfree.
Slightly unrelated: Have you considered proposing your lockfree algorithms and datastructures as part of boost::intrusive? To me, using intrusive datastructures would be preferable in performance critical scenarios, and it's an excellent way of guaranteeing a non-throwing/non-failing enqueue operation (since the user can avoid the copy constructor all-together). A higher-level container such as the current lockfree:fifo could then be built upon such a foundation.

I think the semantic's should be equal to that of std::containers, when possible. How about a compromise:
void enqueue(const T&) throw(...) void try_enqueue(const T&) // no throw guarantee if T's copy constructor is no throw guarantee
this sounds reasonable ... btw, the fifo is limited to PODs, so try_enqueue would never throw.
Have you considered proposing your lockfree algorithms and datastructures as part of boost::intrusive? To me, using intrusive datastructures would be preferable in performance critical scenarios, and it's an excellent way of guaranteeing a non-throwing/non-failing enqueue operation (since the user can avoid the copy constructor all-together). A higher-level container such as the current lockfree:fifo could then be built upon such a foundation.
i have the fear that this would be rather dangerous. objects wouldn't be allowed to be destroyed, while any fifo operation takes place. if thread a dequeues and destroys an object, thread b may still have a local pointer to it and may try to dereference this pointer ... with intrusive data structures, this would be the responsibility of the user ... a rather subtle source of an error. while i am a big fan of intrusive data structures in general, but not for lock-free data structures tim -- tim@klingt.org http://tim.klingt.org A paranoid is a man who knows a little of what's going on. William S. Burroughs

On 7/23/2010 12:24 PM, Tim Blechmann wrote:
Hi Tim, I'd be very interested in using this library when it's "done". For now, I've only skimmed the documentation, but here are some initial comments. I think it would be nice to propagate the links to the data structure references up to the contents page. Right now they appear to be 'hidden' under introduction. Specifically speaking about http://tim.klingt.org/boost_lockfree/boost/lockfree/fifo.html, now: 0. "It uses a freelist for memory management, freed nodes are pushed to the freelist, but not returned to the os. This may result in leaking memory." Presumably this means memory may not be reclaimed until the fifo is destroyed, rather than an indefinite leak? 1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated. Or, would it be possible to provide an additional class that would work with any kind of object, possibly with a performance penalty (and modified interface for exception safety)? If so, I'd vote to make lockfree::fifo the general data structure and perhaps lockfree::pod_fifo the one with POD-specific optimizations. 2. For the is_lock_free() method, it says "Warning: It only checks, if the fifo head node is lockfree. on most platforms, this should be sufficient, though". Sufficient for what? If the implementation can't guarantee lock-free behaviour throughout, I'd simply return false. Lock-free (typically) means something very specific, after all. 3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically? If I'm using a fifo as a work queue, I can imagine that an empty() method that is safe from an invariant-maintenance point of view could be useful in a heuristic to decide whether I should go do something else for a while. In either case, I'd be more specific about what's meant by not thread-safe. But presumably it would be possible to write a thread-safe version given that dequeue can return false? If so, I'd remove/rename/hide this function. 4. "bool dequeue(T * ret);". Using a reference rather than a pointer has been mentioned else-thread. That would also be my preference. Analogous comments apply to some areas of the documentation for the other data structures too. Cheers, Edd

hi edd,
I'd be very interested in using this library when it's "done". For now, I've only skimmed the documentation, but here are some initial comments.
its pretty much `done'
I think it would be nice to propagate the links to the data structure references up to the contents page. Right now they appear to be 'hidden' under introduction.
not sure, how this can easily be achieved with quickbook ... any pointers?
Specifically speaking about http://tim.klingt.org/boost_lockfree/boost/lockfree/fifo.html, now:
0. "It uses a freelist for memory management, freed nodes are pushed to the freelist, but not returned to the os. This may result in leaking memory." Presumably this means memory may not be reclaimed until the fifo is destroyed, rather than an indefinite leak?
yes.
1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
2. For the is_lock_free() method, it says "Warning: It only checks, if the fifo head node is lockfree. on most platforms, this should be sufficient, though". Sufficient for what? If the implementation can't guarantee lock-free behaviour throughout, I'd simply return false. Lock-free (typically) means something very specific, after all.
the c++0x semantics of atomics, provide a per-object member function. all implementations thati am familiar with will always either provide lockfree cas operations for all atomic<> instances or for no. the current implementation will give a hint. otherwise i can simply remove this function from the interface ...
3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically?
calling it may return a wrong result. the data structure is not corrupted.
In either case, I'd be more specific about what's meant by not thread-safe. But presumably it would be possible to write a thread-safe version given that dequeue can return false? If so, I'd remove/rename/hide this function.
for the fifo, it will never be possible. if so, you would have to load two atomic objects atomically, which is impossible ... tim -- tim@klingt.org http://tim.klingt.org I must say I find television very educational. The minute somebody turns it on, I go to the library and read a good book. Groucho Marx

On 07/24/2010 07:49 AM, Tim Blechmann wrote:
3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically?
calling it may return a wrong result. the data structure is not corrupted.
Am I right in assuming that the result it produces is correct when it is produced, and the only problem is that something could get added to the queue before the caller gets a chance to examine the result? Sebastian

3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically?
calling it may return a wrong result. the data structure is not corrupted.
Am I right in assuming that the result it produces is correct when it is produced, and the only problem is that something could get added to the queue before the caller gets a chance to examine the result?
for empty(), the following steps need to be done: 1. atomically load the head pointer 2. atomically load the tail pointer 3. compare the pointers. if the state of the fifo is changed between steps 1 and 2, the result doesn't reflect the state of the fifo anymore ... tim -- tim@klingt.org http://tim.klingt.org After one look at this planet any visitor from outer space would say "I want to see the manager." William S. Burroughs

At Sat, 24 Jul 2010 22:07:44 +0200, Tim Blechmann wrote:
3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically?
calling it may return a wrong result. the data structure is not corrupted.
Am I right in assuming that the result it produces is correct when it is produced, and the only problem is that something could get added to the queue before the caller gets a chance to examine the result?
for empty(), the following steps need to be done: 1. atomically load the head pointer 2. atomically load the tail pointer 3. compare the pointers.
if the state of the fifo is changed between steps 1 and 2, the result doesn't reflect the state of the fifo anymore ...
Not that that matters. The state of the FIFO could be changed after step 3 and before you use the result of the empty() call. In general, predicates about a data structure that might be modified from another thread are useless unless you can afford for them to give you the wrong answer. Sometimes, the nature of the data structure and how it's used can make such predicates a little less-useless. For example, if thread A is the only one that removes elements from a FIFO, a true result from empty() tells thread A nothing useful, but a false result actually means the FIFO is non-empty in a useful way, because it will remain so no matter what happens on the other end of the FIFO, until A removes at least one element. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Sat, Jul 24, 2010 at 7:49 AM, Tim Blechmann <tim@klingt.org> wrote:
1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
It should be possible to automatically allocate non-pods from a freelist. Might I suggest turning this library into a general concurrent data structures library? Lock-free is great, but sometimes it's either not available or is simply too slow outside of niche circumstances. I think a more general library that includes lock-free along with other less-than-lock-free algorithms would be more useful to the majority of developers. -- Cory Nelson http://int64.org

1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
It should be possible to automatically allocate non-pods from a freelist.
... one could introduce another indirection ...
Might I suggest turning this library into a general concurrent data structures library? Lock-free is great, but sometimes it's either not available or is simply too slow outside of niche circumstances. I think a more general library that includes lock-free along with other less-than-lock-free algorithms would be more useful to the majority of developers.
the idea is not too bad, but frankly, i don't have the resources to develop a library of general-purpose concurrent data structures. there are libraries like tbb out there, that already provide many concurrent data structures ... but not so many libraries for lock-free data structures, that can actually be used in real-time systems tim -- tim@klingt.org http://tim.klingt.org Happiness is a byproduct of function, purpose, and conflict; those who seek happiness for itself seek victory without war. William S. Burroughs

On 24/07/10 15:49, Tim Blechmann wrote:
1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
What exactly about PODs is it taking advantage of? In C++0x will it work for all standard layout types? John Bytheway

1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
What exactly about PODs is it taking advantage of? In C++0x will it work for all standard layout types?
one needs to read the payload from the fifo node, before it is dequeued. this may happen more than once otherwise a later dequeue may free the node. it is nothing, c++0x can change, but a property of the michael/scott queue [1]. the type essentially needs a trivial copy constructor. tim [1] http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html -- tim@klingt.org http://tim.klingt.org The composer makes plans, music laughs. Morton Feldman

On 25/07/10 09:50, Tim Blechmann wrote:
What exactly about PODs is it taking advantage of? In C++0x will it work for all standard layout types?
one needs to read the payload from the fifo node, before it is dequeued. this may happen more than once otherwise a later dequeue may free the node. it is nothing, c++0x can change, but a property of the michael/scott queue [1]. the type essentially needs a trivial copy constructor.
If that's all it needs then I suggest you phrase the limitation in those terms; it's a lot less restrictive than requiring a POD type (which cannot, for example, have any user-defined constructors).
[1] http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html
If you implemented it exactly as it appears in this link then I guess it really needs a trivial copy assignment operator, not a trivial copy constructor, but of course the two tend to go hand-in-hand. You should probably require both. John Bytheway

On 7/24/2010 3:49 PM, Tim Blechmann wrote:
I think it would be nice to propagate the links to the data structure references up to the contents page. Right now they appear to be 'hidden' under introduction.
not sure, how this can easily be achieved with quickbook ... any pointers?
Sorry, I have no experience with quickbook :(
Specifically speaking about http://tim.klingt.org/boost_lockfree/boost/lockfree/fifo.html, now:
0. "It uses a freelist for memory management, freed nodes are pushed to the freelist, but not returned to the os. This may result in leaking memory." Presumably this means memory may not be reclaimed until the fifo is destroyed, rather than an indefinite leak?
yes.
In that case, I wonder if it's worth rephrasing those notes?
1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
Ok. But then would it be worth having a version of a fifo that does the heap-allocation indirection, to save everybody from re-writing it every time they need a non-POD fifo?
2. For the is_lock_free() method, it says "Warning: It only checks, if the fifo head node is lockfree. on most platforms, this should be sufficient, though". Sufficient for what? If the implementation can't guarantee lock-free behaviour throughout, I'd simply return false. Lock-free (typically) means something very specific, after all.
the c++0x semantics of atomics, provide a per-object member function. all implementations thati am familiar with will always either provide lockfree cas operations for all atomic<> instances or for no. the current implementation will give a hint. otherwise i can simply remove this function from the interface ...
I see. FWIW, I can't imagine that an implementation would restrict the number of atomic<>s that provide a lock-free CAS. That's probably your reasoning too I guess. Is this something that can be clarified by the authors of Boost.Atomic?
3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically?
calling it may return a wrong result. the data structure is not corrupted.
Then I would vote to add such a note to the documentation, clarifying the way in which it's not thread-safe. It could still be useful for some cases outside of debugging. Kind regards, Edd

0. "It uses a freelist for memory management, freed nodes are pushed to the freelist, but not returned to the os. This may result in leaking memory." Presumably this means memory may not be reclaimed until the fifo is destroyed, rather than an indefinite leak?
yes.
In that case, I wonder if it's worth rephrasing those notes?
it probably is ... i am currently working on it ...
1. "Limitation: The fifo class is limited to PODs". I really would like to be able to use this with arbitrary objects. I'm sure PODs are required for good reason, but a rationale somewhere would be greatly appreciated.
it is a limitation of the michael/scott algorithm. if you want to pass non- pods, you have to use heap-allocated pointers.
Ok. But then would it be worth having a version of a fifo that does the heap-allocation indirection, to save everybody from re-writing it every time they need a non-POD fifo?
mpl could be used for selecting the specific implementation ... otoh, it will hide the additional complexity behind the interface. in some cases, it may be feasible to adapt the used classes instead ...
2. For the is_lock_free() method, it says "Warning: It only checks, if the fifo head node is lockfree. on most platforms, this should be sufficient, though". Sufficient for what? If the implementation can't guarantee lock-free behaviour throughout, I'd simply return false. Lock-free (typically) means something very specific, after all.
the c++0x semantics of atomics, provide a per-object member function. all implementations thati am familiar with will always either provide lockfree cas operations for all atomic<> instances or for no. the current implementation will give a hint. otherwise i can simply remove this function from the interface ...
I see. FWIW, I can't imagine that an implementation would restrict the number of atomic<>s that provide a lock-free CAS. That's probably your reasoning too I guess. Is this something that can be clarified by the authors of Boost.Atomic?
well, one thing, that i didn't think of, is alignment ... on some architectures, cas-able memory regions need to be aligned ...
3. For the empty() method, it says "Not thread-safe, use for debugging purposes only". Does this mean calling it might destroy the data structure's invariants? Or is it always safe in that regard? In which scenarios can it be used, specifically?
calling it may return a wrong result. the data structure is not corrupted.
Then I would vote to add such a note to the documentation, clarifying the way in which it's not thread-safe. It could still be useful for some cases outside of debugging.
done tim -- tim@klingt.org http://tim.klingt.org When you open windows, bugs get in.

I am no expert in concurrent programming but I did try to use this implementation of lockfree fifo in a real world app to see how it compares with a few other alternatives: a serialized deque, intel's concurrent_queue, a home grown implementation, and a couple of others. Here's my 2c worth: 1. It seems to me that supporting only POD types makes this whole exercise largely academic, since the things most people would want to put in the queue will be _not_ be POD. And storing pointers may not be an acceptable option either, since, presumably, if someone's looking for a lock free queue his main concern is performance. I had to disable the is_pod check just to be able to benchmark it. My understanding is that fifo stores pointers to nodes which off from the heap. Is an impossibility on all platforms to have a node with a non-pod type and preserve the lock free semantics? 2. With VC++ on Windows both 32 and 64 bit, fifo reverts to using spinlocks even though there's a 64 bit equivalent of interlocked intrinsics in 64bit builds. Intel's implementation, I believe, supports lockfree atomic<long long> even on 32 bit systems so this should be possible. 3. I also think it's important to provide size() method if only for monitoring purposes. (unsafe_size() maybe). Most people would want to take some action if the size of the queue exceed a reasonable (in their domain) limit. 4. is_lock_free() should be a compile time check. I may want to chose a different queue if the lock free version isn't available. All this said, I have to say that the fifo performance in our environment (64 bit linux with 8 xeon processors, gcc 4.5) was quite good. Comparable with tbb::concurrent_queue and way better than all the other options we tried. Thanks!

Oleg Grunin wrote:
All this said, I have to say that the fifo performance in our environment (64 bit linux with 8 xeon processors, gcc 4.5) was quite good. Comparable with tbb::concurrent_queue and way better than all the other options we tried.
Did your application use a queue of PODS, structures, or pointers?
From you post, it sounds like the second isn't supported.
Robert Ramey

Robert Ramey wrote:
Oleg Grunin wrote:
All this said, I have to say that the fifo performance in our environment (64 bit linux with 8 xeon processors, gcc 4.5) was quite good. Comparable with tbb::concurrent_queue and way better than all the other options we tried.
Did your application use a queue of PODS, structures, or pointers?
From you post, it sounds like the second isn't supported.
Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I used a non-pod struct. As I mentioned I had to disable the is_pod check to be able to test it. I didn't notice any adverse effects. Another nice thing would be to provide move semantics. My struct's are moved, not copied, into the queue.

All this said, I have to say that the fifo performance in our environment (64 bit linux with 8 xeon processors, gcc 4.5) was quite good. Comparable with tbb::concurrent_queue and way better than all the other options we tried.
Did your application use a queue of PODS, structures, or pointers?
From you post, it sounds like the second isn't supported.
[snip]
I used a non-pod struct. As I mentioned I had to disable the is_pod check to be able to test it. I didn't notice any adverse effects.
Another nice thing would be to provide move semantics. My struct's are moved, not copied, into the queue.
the fifo is an implementation of the michael-scott queue [1]. the dequeue operation copies the payload (D12), without knowing if the operation can be completed. if two threads are trying to dequeue at the same time, the assignment may happen twice. one thread will succeed, the other one will have to retry. with a trivial assignment, this is safe, with a non-trivial assignment there are cases (e.g. auto_ptr), you will experience some problems ... tim [1] http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html -- tim@klingt.org http://tim.klingt.org The composer makes plans, music laughs. Morton Feldman

sry for the delay, the mail got lost in my inbox ...
1. It seems to me that supporting only POD types makes this whole exercise largely academic, since the things most people would want to put in the queue will be _not_ be POD. And storing pointers may not be an acceptable option either, since, presumably, if someone's looking for a lock free queue his main concern is performance. I had to disable the is_pod check just to be able to benchmark it. My understanding is that fifo stores pointers to nodes which off from the heap. Is an impossibility on all platforms to have a node with a non-pod type and preserve the lock free semantics?
i've relaxed the requirement a bit. the type is actually just required to have a trivial assignment, it doesn't need to be a full POD. this requirement is specific for the algorithm. btw, some people use this for real-time applications, where worst-case latency is more important than throughput.
2. With VC++ on Windows both 32 and 64 bit, fifo reverts to using spinlocks even though there's a 64 bit equivalent of interlocked intrinsics in 64bit builds. Intel's implementation, I believe, supports lockfree atomic<long long> even on 32 bit systems so this should be possible.
true ... there have been some people mentioning this, it shouldn't be hard to implement dcas for boost.atomic. unfortunately neither helge nor me seem to use msvc and nobody with msvc wants to get his hands dirty to implement the specific part ...
3. I also think it's important to provide size() method if only for monitoring purposes. (unsafe_size() maybe). Most people would want to take some action if the size of the queue exceed a reasonable (in their domain) limit.
this would have two issues: - it won't be accurate - it would either be expensive to call (O(n)) or introduce an overhead of an atomic operation per push/pop
4. is_lock_free() should be a compile time check. I may want to chose a different queue if the lock free version isn't available.
i'd prefer a compile-time check as well, but std::atomic::is_lock_free is a run-time check ...
All this said, I have to say that the fifo performance in our environment (64 bit linux with 8 xeon processors, gcc 4.5) was quite good. Comparable with tbb::concurrent_queue and way better than all the other options we tried.
good to hear. tbb::concurrent_queue is a blocking algorithm (iirc they are using spinlocks), while boost::lockfree::fifo is lockfree. so they have somewhat different performance characteristics. the concurrent_queue may have acquire the spinlock and the os decides to suspend the thread. this may be rare, but it is not impossible. the lockfree::fifo doesn't have this issue ... cheers, tim -- tim@klingt.org http://tim.klingt.org Trane was the Father...Pharoah was the son...I am the Holy Ghost. Albert Ayler

Thanks for the reply.
sry for the delay, the mail got lost in my inbox ...
2. With VC++ on Windows both 32 and 64 bit, fifo reverts to using spinlocks even though there's a 64 bit equivalent of interlocked intrinsics in 64bit builds. Intel's implementation, I believe, supports lockfree atomic<long long> even on 32 bit systems so this should be possible.
true ... there have been some people mentioning this, it shouldn't be hard to implement dcas for boost.atomic. unfortunately neither helge nor me seem to use msvc and nobody with msvc wants to get his hands dirty to implement the specific part ...
I may give it a crack. At least on the 64bit platform. It should be pretty straight forward.
3. I also think it's important to provide size() method if only for monitoring purposes. (unsafe_size() maybe). Most people would want to take some action if the size of the queue exceed a reasonable (in their domain) limit.
this would have two issues: - it won't be accurate - it would either be expensive to call (O(n)) or introduce an overhead of an atomic operation per push/pop
I think it's ok on both points. You can't really rely on size() even if it were accurate. We take size samples every 10000 or so push'es to see if the queue is being backed up. As long as the cost of size() is well known and deterministic, I think it's worth having it.
4. is_lock_free() should be a compile time check. I may want to chose a different queue if the lock free version isn't available.
i'd prefer a compile-time check as well, but std::atomic::is_lock_free is a run-time check ...
Hopefully, the atomic guys will take notice.
All this said, I have to say that the fifo performance in our environment (64 bit linux with 8 xeon processors, gcc 4.5) was quite good. Comparable with tbb::concurrent_queue and way better than all the other options we tried.
good to hear. tbb::concurrent_queue is a blocking algorithm (iirc they are using spinlocks), while boost::lockfree::fifo is lockfree. so they have somewhat different performance characteristics. the concurrent_queue may have acquire the spinlock and the os decides to suspend the thread. this may be rare, but it is not impossible. the lockfree::fifo doesn't have this issue ...
I may be confused, but doesn't fifo have to spin (for(;;;)) also? Or do you mean that the tbb spin will call yield() after a certain number of tries?

Oleg Grunin wrote:
I may be confused, but doesn't fifo have to spin (for(;;;)) also? Or do you mean that the tbb spin will call yield() after a certain number of tries?
There's a loop in fifo, true. But this loop is non-blocking in a sense that it has a property that if several threads enter this loop then at least one thread will make progress for sure. With a blocking spin-lock, it's possible for none of the threads to make any progress. Andy.

true ... there have been some people mentioning this, it shouldn't be hard to implement dcas for boost.atomic. unfortunately neither helge nor me seem to use msvc and nobody with msvc wants to get his hands dirty to implement the specific part ...
I may give it a crack. At least on the 64bit platform. It should be pretty straight forward.
that would be great!
3. I also think it's important to provide size() method if only for monitoring purposes. (unsafe_size() maybe). Most people would want to take some action if the size of the queue exceed a reasonable (in their domain) limit.
this would have two issues: - it won't be accurate - it would either be expensive to call (O(n)) or introduce an overhead of an atomic operation per push/pop
I think it's ok on both points. You can't really rely on size() even if it were accurate. We take size samples every 10000 or so push'es to see if the queue is being backed up. As long as the cost of size() is well known and deterministic, I think it's worth having it.
hm, i can try to introduce a O(n) version ...
4. is_lock_free() should be a compile time check. I may want to chose a different queue if the lock free version isn't available.
i'd prefer a compile-time check as well, but std::atomic::is_lock_free is a run-time check ...
Hopefully, the atomic guys will take notice.
well, the standard allows implementations, where not all instances of std::atomic are lock-free. e.g. the lock-free property may depend on the object alignment tim -- tim@klingt.org http://tim.klingt.org There's no such entity as "most people". These are generalities. All generalities are meaningless. You've got to pin it down to a specific person doing a specific thing at a specific time and space. William S. Burroughs
participants (11)
-
Andy Venikov
-
Christian Holmquist
-
Cory Nelson
-
David Abrahams
-
Edd Dawson
-
John Bytheway
-
Oleg Grunin
-
Robert Ramey
-
Sebastian Redl
-
Steven Watanabe
-
Tim Blechmann