
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.

Patrick Twohig wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
Intel TBB includes some.

On Thu, 17 Apr 2008 19:54:47 +0200, Mathias Gaunard wrote:
Patrick Twohig wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
Intel TBB includes some.
aehm, are you sure? when i checked the tbb a few monthes ago, the containers used spinlocks, so they are not lock-free ... cheers, tim -- tim@klingt.org http://tim.klingt.org The composer makes plans, music laughs. Morton Feldman

On Wed, Apr 16, 2008 at 9:58 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
I have some lock-free stack/queue code for x86/x64/ia64 in c++. It only works with vc++ 2008 intrinsics right now, and the x64 version is dependant on the windows memory manager if used on older CPUs. If people are interested in boostifying and porting it, I suppose we could do that.

I wrote a C++ lock free queue implementation recently based on the Michael & Scott algorithm, that uses APR's atomics. It's available at: http://svn.atomiclabs.com/pion-common/trunk/common/include/pion/PionLockFree... You're welcome to use it but you should be aware that it's not quite debugged yet. It has issues when I tried using more than 1 producer and 1 consumer thread. I wasn't sure if this was a mistake I made in the code, or a limitation of the algorithm, but I assume it's the former. Anyway, in the end I decided not to use the code for what I was working on... But if you can find and fix the bug, it's all yours! =) Of course, it depends on APR's atomics.. Boost does not have any compare-and-swap abstraction as I far as I know, which seems to be a basic requirement for any lock-free programming. Or maybe it does and I just don't know? Take care, -Mike On Apr 17, 2008, at 12:57 PM, Cory Nelson wrote:
On Wed, Apr 16, 2008 at 9:58 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
I have some lock-free stack/queue code for x86/x64/ia64 in c++. It only works with vc++ 2008 intrinsics right now, and the x64 version is dependant on the windows memory manager if used on older CPUs. If people are interested in boostifying and porting it, I suppose we could do that. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Are you running it on a multicore machine or single core? That radically changes things. I was having the same problem. To test I ran 500 threads or so a third of them were producers. My suspicion was that I wasn't enforcing the proper memory barriers because for x86 I had to write my own CAS in assembly. On Thu, Apr 17, 2008 at 3:23 PM, Michael Dickey <mike@mikedickey.com> wrote:
I wrote a C++ lock free queue implementation recently based on the Michael & Scott algorithm, that uses APR's atomics. It's available at:
http://svn.atomiclabs.com/pion-common/trunk/common/include/pion/PionLockFree...
You're welcome to use it but you should be aware that it's not quite debugged yet. It has issues when I tried using more than 1 producer and 1 consumer thread. I wasn't sure if this was a mistake I made in the code, or a limitation of the algorithm, but I assume it's the former. Anyway, in the end I decided not to use the code for what I was working on... But if you can find and fix the bug, it's all yours! =)
Of course, it depends on APR's atomics.. Boost does not have any compare-and-swap abstraction as I far as I know, which seems to be a basic requirement for any lock-free programming. Or maybe it does and I just don't know?
Take care, -Mike
On Apr 17, 2008, at 12:57 PM, Cory Nelson wrote:
On Wed, Apr 16, 2008 at 9:58 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
I have some lock-free stack/queue code for x86/x64/ia64 in c++. It only works with vc++ 2008 intrinsics right now, and the x64 version is dependant on the windows memory manager if used on older CPUs. If people are interested in boostifying and porting it, I suppose we could do that. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Yes, I'm using a multi-core machine. Where in my code would I need to add barriers, or is this perhaps a limitation in APR's CAS? Thanks, -Mike On Apr 17, 2008, at 3:45 PM, Patrick Twohig wrote:
Are you running it on a multicore machine or single core? That radically changes things. I was having the same problem. To test I ran 500 threads or so a third of them were producers. My suspicion was that I wasn't enforcing the proper memory barriers because for x86 I had to write my own CAS in assembly.
On Thu, Apr 17, 2008 at 3:23 PM, Michael Dickey <mike@mikedickey.com> wrote:
I wrote a C++ lock free queue implementation recently based on the Michael & Scott algorithm, that uses APR's atomics. It's available at:
http://svn.atomiclabs.com/pion-common/trunk/common/include/pion/PionLockFree...
You're welcome to use it but you should be aware that it's not quite debugged yet. It has issues when I tried using more than 1 producer and 1 consumer thread. I wasn't sure if this was a mistake I made in the code, or a limitation of the algorithm, but I assume it's the former. Anyway, in the end I decided not to use the code for what I was working on... But if you can find and fix the bug, it's all yours! =)
Of course, it depends on APR's atomics.. Boost does not have any compare-and-swap abstraction as I far as I know, which seems to be a basic requirement for any lock-free programming. Or maybe it does and I just don't know?
Take care, -Mike
On Apr 17, 2008, at 12:57 PM, Cory Nelson wrote:
On Wed, Apr 16, 2008 at 9:58 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
I have some lock-free stack/queue code for x86/x64/ia64 in c++. It only works with vc++ 2008 intrinsics right now, and the x64 version is dependant on the windows memory manager if used on older CPUs. If people are interested in boostifying and porting it, I suppose we could do that. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper threading where multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another process will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory transactions before the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces the single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations. On Thu, Apr 17, 2008 at 4:47 PM, Michael Dickey <mike@mikedickey.com> wrote:
Yes, I'm using a multi-core machine. Where in my code would I need to add barriers, or is this perhaps a limitation in APR's CAS?
Thanks, -Mike
On Apr 17, 2008, at 3:45 PM, Patrick Twohig wrote:
Are you running it on a multicore machine or single core? That radically changes things. I was having the same problem. To test I ran 500 threads or so a third of them were producers. My suspicion was that I wasn't enforcing the proper memory barriers because for x86 I had to write my own CAS in assembly.
On Thu, Apr 17, 2008 at 3:23 PM, Michael Dickey <mike@mikedickey.com> wrote:
I wrote a C++ lock free queue implementation recently based on the Michael & Scott algorithm, that uses APR's atomics. It's available at:
http://svn.atomiclabs.com/pion-common/trunk/common/include/pion/PionLockFree...
You're welcome to use it but you should be aware that it's not quite debugged yet. It has issues when I tried using more than 1 producer and 1 consumer thread. I wasn't sure if this was a mistake I made in the code, or a limitation of the algorithm, but I assume it's the former. Anyway, in the end I decided not to use the code for what I was working on... But if you can find and fix the bug, it's all yours! =)
Of course, it depends on APR's atomics.. Boost does not have any compare-and-swap abstraction as I far as I know, which seems to be a basic requirement for any lock-free programming. Or maybe it does and I just don't know?
Take care, -Mike
On Apr 17, 2008, at 12:57 PM, Cory Nelson wrote:
On Wed, Apr 16, 2008 at 9:58 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible.
I have some lock-free stack/queue code for x86/x64/ia64 in c++. It only works with vc++ 2008 intrinsics right now, and the x64 version is dependant on the windows memory manager if used on older CPUs. If people are interested in boostifying and porting it, I suppose we could do that. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper threading where multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another process will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory transactions before the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces the single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something. Digging through my code, will try to get it up tonight.

D'oh. Single instruction I meant to say. On Thu, Apr 17, 2008 at 6:09 PM, Cory Nelson <phrosty@gmail.com> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper threading where multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote: process
will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory transactions before the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces the single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Digging through my code, will try to get it up tonight. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Fri, Apr 18, 2008 at 3:09 AM, Cory Nelson <phrosty@gmail.com> wrote:
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper threading where multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another process will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory transactions before the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces the single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Well, you actually need StoreLoad memory barriers on x86. All other barriers are always implicit (unless you use non temporal SSE store/loads). StoreLoad is also implicit if you use locked operations, otherwise you need an explicit mfence. See, for example, http://g.oswego.edu/dl/jmm/cookbook.html -- gpd

Another article talks about writing on the Xbox360, which is what I needed a queue for: http://msdn2.microsoft.com/en-us/library/bb310595(VS.85).aspx On Fri, Apr 18, 2008 at 3:51 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper
multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another
will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory
the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces
On Fri, Apr 18, 2008 at 3:09 AM, Cory Nelson <phrosty@gmail.com> wrote: threading where process transactions before the
single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Well, you actually need StoreLoad memory barriers on x86. All other barriers are always implicit (unless you use non temporal SSE store/loads).
StoreLoad is also implicit if you use locked operations, otherwise you need an explicit mfence.
See, for example, http://g.oswego.edu/dl/jmm/cookbook.html
-- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

If the CAS is prefixed with 'lock' (intel) then this emits a two way barrier. Virtually all compiler built ins of CAS do this. In my implementation of queue I did not need barriers other than those already used by the CAS. I am interested in developing some lock-free libs for boost, if other are willing to help. I have a working unbounded MS Queue (relies on DCAS at the moment, but I have ideas on how to just support Single CAS (e.g. the node allocator could always align on 64 byte boundaries, leaving space for a counter with a reasonable range). I have a linearized smart_ptr (which makes STM much much easier in C++). I have a std::allocator --compatible lockfree allocator (reduced contention inside of container quite a bit) and of course a stack. Lance -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Patrick Twohig Sent: Friday, April 18, 2008 11:32 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures. Another article talks about writing on the Xbox360, which is what I needed a queue for: http://msdn2.microsoft.com/en-us/library/bb310595(VS.85).aspx On Fri, Apr 18, 2008 at 3:51 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper
multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another
will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory
the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces
On Fri, Apr 18, 2008 at 3:09 AM, Cory Nelson <phrosty@gmail.com> wrote: threading where process transactions before the
single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Well, you actually need StoreLoad memory barriers on x86. All other barriers are always implicit (unless you use non temporal SSE store/loads).
StoreLoad is also implicit if you use locked operations, otherwise you
need an explicit mfence.
See, for example, http://g.oswego.edu/dl/jmm/cookbook.html
-- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

I for one would LOVE to see support in Boost for lock-free data structures. I imagine that if these were more readily available, a lot of C++ programmers would want to use them. I took a (brief) look at the TSTL code (from /. on Freshmeat), and it does not appear to be lock-free. Rather, it looks like a collection of primitives for thread-safe programming (which overlaps the Boost.Thread functionality) combined with some thread-safe data structures that use these primitives. Tim seems to have a great lock-free FIFO implementation, although I see that it uses a GPL license rather than Boost. Any chance we could convince you to change the license, Tim? (nudge, nudge, wink, wink) =) I'd like to help make such a library part of Boost, but the more I learn about lock-free coding, the more I discover that I have yet to learn =) I think that for now the most I can offer is enthusiastic encouragement! =) Take care, -Mike On Apr 18, 2008, at 8:50 AM, <Lance.Diduck@ubs.com> wrote:
If the CAS is prefixed with 'lock' (intel) then this emits a two way barrier. Virtually all compiler built ins of CAS do this. In my implementation of queue I did not need barriers other than those already used by the CAS.
I am interested in developing some lock-free libs for boost, if other are willing to help. I have a working unbounded MS Queue (relies on DCAS at the moment, but I have ideas on how to just support Single CAS (e.g. the node allocator could always align on 64 byte boundaries, leaving space for a counter with a reasonable range). I have a linearized smart_ptr (which makes STM much much easier in C++). I have a std::allocator --compatible lockfree allocator (reduced contention inside of container quite a bit) and of course a stack.
Lance
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Patrick Twohig Sent: Friday, April 18, 2008 11:32 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures.
Another article talks about writing on the Xbox360, which is what I needed a queue for: http://msdn2.microsoft.com/en-us/library/bb310595(VS.85).aspx
On Fri, Apr 18, 2008 at 3:51 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper
multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another
will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory
the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces
On Fri, Apr 18, 2008 at 3:09 AM, Cory Nelson <phrosty@gmail.com> wrote: threading where process transactions before the
single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Well, you actually need StoreLoad memory barriers on x86. All other barriers are always implicit (unless you use non temporal SSE store/loads).
StoreLoad is also implicit if you use locked operations, otherwise you
need an explicit mfence.
See, for example, http://g.oswego.edu/dl/jmm/cookbook.html
-- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.
E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

There is lots of ways to help, even if you've never touched lockfree programming. In the case of queues, stacks and such, these are well understood, and really need to library-tized. e.g. 1. source code conforms to boost style and repository guidelines 2. test harnesses written and integrated with boost build (e.g. my system uses regular make and CPPUNIT) 3. docs and example code written All of the above is very important, and does not require any knowledge of lock free programming, just an interest. There is even much design work to be done, for instance, this is my queue interface template<class _Val> //_Val must be bitwise comparable class michael_scott_queue { typedef _Val value_type; //other typedefs michael_scott_queue() ; ~michael_scott_queue(); void push(value_type const &_val); bool pop(value_type&return_val);//return false on empty private: volatile node_type m_head;//node_type implements CAS instruction volatile node_type m_tail; allocators::MTSlab<> m_allocator;//lock free node allocator (fixed size blocks) }; Now, without knowing anything about how ctor, dtor, push and pop are implemented, what can be done? Adding allocator as a parameter just like a std::container; changing node_type to conform to std::atomic; perhaps making a version that includes a monitor (so that pop blocks rather than returns false) and so forth. I havent worked out yet the exception guarantees. There is lots of stuff, and you can learn lock free programming on the way. If you have done distributed programming before, then lock -free SMP programming is simply using loss-free communication channels that are only a word or two wide, and there is no such thing as a "intermediate invalid" state -- i.e. all operations (messages) preserve all invariants. If you have never done distributed programming, well, then there is a lot to learn indeed!! Lance -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Michael Dickey Sent: Friday, April 18, 2008 12:40 PM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures. I for one would LOVE to see support in Boost for lock-free data structures. I imagine that if these were more readily available, a lot of C++ programmers would want to use them. I took a (brief) look at the TSTL code (from /. on Freshmeat), and it does not appear to be lock-free. Rather, it looks like a collection of primitives for thread-safe programming (which overlaps the Boost.Thread functionality) combined with some thread-safe data structures that use these primitives. Tim seems to have a great lock-free FIFO implementation, although I see that it uses a GPL license rather than Boost. Any chance we could convince you to change the license, Tim? (nudge, nudge, wink, wink) =) I'd like to help make such a library part of Boost, but the more I learn about lock-free coding, the more I discover that I have yet to learn =) I think that for now the most I can offer is enthusiastic encouragement! =) Take care, -Mike On Apr 18, 2008, at 8:50 AM, <Lance.Diduck@ubs.com> wrote:
If the CAS is prefixed with 'lock' (intel) then this emits a two way barrier. Virtually all compiler built ins of CAS do this. In my implementation of queue I did not need barriers other than those already used by the CAS.
I am interested in developing some lock-free libs for boost, if other are willing to help. I have a working unbounded MS Queue (relies on DCAS at the moment, but I have ideas on how to just support Single CAS
(e.g. the node allocator could always align on 64 byte boundaries, leaving space for a counter with a reasonable range). I have a linearized smart_ptr (which makes STM much much easier in C++). I have a std::allocator --compatible lockfree allocator (reduced contention inside of container quite a bit) and of course a stack.
Lance
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Patrick Twohig Sent: Friday, April 18, 2008 11:32 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures.
Another article talks about writing on the Xbox360, which is what I needed a queue for: http://msdn2.microsoft.com/en-us/library/bb310595(VS.85).aspx
On Fri, Apr 18, 2008 at 3:51 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper
multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another
will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory
the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces
On Fri, Apr 18, 2008 at 3:09 AM, Cory Nelson <phrosty@gmail.com> wrote: threading where process transactions before the
single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything
specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Well, you actually need StoreLoad memory barriers on x86. All other barriers are always implicit (unless you use non temporal SSE store/loads).
StoreLoad is also implicit if you use locked operations, otherwise you
need an explicit mfence.
See, for example, http://g.oswego.edu/dl/jmm/cookbook.html
-- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify
the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.
E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the
contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

It does sound like an interesting project to me, but unfortunately I feel my "free (non-work) time" is already over-tasked with working on http support for a boost network library. Perhaps once that project is done I can lend a hand to help out with this. Take care, -Mike On Apr 18, 2008, at 10:23 AM, <Lance.Diduck@ubs.com> <Lance.Diduck@ubs.com
wrote: There is lots of ways to help, even if you've never touched lockfree programming. In the case of queues, stacks and such, these are well understood, and really need to library-tized. e.g. 1. source code conforms to boost style and repository guidelines 2. test harnesses written and integrated with boost build (e.g. my system uses regular make and CPPUNIT) 3. docs and example code written
All of the above is very important, and does not require any knowledge of lock free programming, just an interest. There is even much design work to be done, for instance, this is my queue interface template<class _Val> //_Val must be bitwise comparable class michael_scott_queue { typedef _Val value_type; //other typedefs michael_scott_queue() ; ~michael_scott_queue(); void push(value_type const &_val); bool pop(value_type&return_val);//return false on empty private: volatile node_type m_head;//node_type implements CAS instruction volatile node_type m_tail; allocators::MTSlab<> m_allocator;//lock free node allocator (fixed size blocks) };
Now, without knowing anything about how ctor, dtor, push and pop are implemented, what can be done? Adding allocator as a parameter just like a std::container; changing node_type to conform to std::atomic; perhaps making a version that includes a monitor (so that pop blocks rather than returns false) and so forth. I havent worked out yet the exception guarantees. There is lots of stuff, and you can learn lock free programming on the way. If you have done distributed programming before, then lock -free SMP programming is simply using loss-free communication channels that are only a word or two wide, and there is no such thing as a "intermediate invalid" state -- i.e. all operations (messages) preserve all invariants. If you have never done distributed programming, well, then there is a lot to learn indeed!!
Lance
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Michael Dickey Sent: Friday, April 18, 2008 12:40 PM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures.
I for one would LOVE to see support in Boost for lock-free data structures. I imagine that if these were more readily available, a lot of C++ programmers would want to use them.
I took a (brief) look at the TSTL code (from /. on Freshmeat), and it does not appear to be lock-free. Rather, it looks like a collection of primitives for thread-safe programming (which overlaps the Boost.Thread functionality) combined with some thread-safe data structures that use these primitives.
Tim seems to have a great lock-free FIFO implementation, although I see that it uses a GPL license rather than Boost. Any chance we could convince you to change the license, Tim? (nudge, nudge, wink, wink) =)
I'd like to help make such a library part of Boost, but the more I learn about lock-free coding, the more I discover that I have yet to learn =)
I think that for now the most I can offer is enthusiastic encouragement! =)
Take care, -Mike
On Apr 18, 2008, at 8:50 AM, <Lance.Diduck@ubs.com> wrote:
If the CAS is prefixed with 'lock' (intel) then this emits a two way barrier. Virtually all compiler built ins of CAS do this. In my implementation of queue I did not need barriers other than those already used by the CAS.
I am interested in developing some lock-free libs for boost, if other are willing to help. I have a working unbounded MS Queue (relies on DCAS at the moment, but I have ideas on how to just support Single CAS
(e.g. the node allocator could always align on 64 byte boundaries, leaving space for a counter with a reasonable range). I have a linearized smart_ptr (which makes STM much much easier in C++). I have a std::allocator --compatible lockfree allocator (reduced contention inside of container quite a bit) and of course a stack.
Lance
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Patrick Twohig Sent: Friday, April 18, 2008 11:32 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures.
Another article talks about writing on the Xbox360, which is what I needed a queue for: http://msdn2.microsoft.com/en-us/library/bb310595(VS.85).aspx
On Fri, Apr 18, 2008 at 3:51 AM, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On Thu, Apr 17, 2008 at 5:24 PM, Patrick Twohig <p-twohig@ieee.org> wrote:
Theoretically, the CAS should atomically compare and swap the value in one clock cycle. However, with multiple cores/processors/hyper
multiple instructions are being executed simultaneously over arbitrary numbers of clock cycles. There can be writes pending while you want to read from memory. As a result, when you go to read something another
will have written to but you read stale data. To combat this, you enforce a memory barrier, which guarantees that all pending memory
the barrier have completed before moving on with the program. Additionally, some architectures (like x86) allow for unaligned access of memory. When an unaligned value is accessed, it sets an exception then it replaces
On Fri, Apr 18, 2008 at 3:09 AM, Cory Nelson <phrosty@gmail.com> wrote: threading where process transactions before the
single read/write operation with multiple bus operations which wreaks havoc on any compare/swap operations.
They don't happen in a single cycle, I don't think there is anything
specifying that they should. Barriers aren't needed on x86 or x64, other than compile-time only ones to make sure the compiler doesn't reorder something.
Well, you actually need StoreLoad memory barriers on x86. All other barriers are always implicit (unless you use non temporal SSE store/loads).
StoreLoad is also implicit if you use locked operations, otherwise you
need an explicit mfence.
See, for example, http://g.oswego.edu/dl/jmm/cookbook.html
-- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify
the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.
E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the
contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.
E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Fri, 18 Apr 2008 13:55:35 -0700, Michael Dickey wrote:
Tim seems to have a great lock-free FIFO implementation, although I see that it uses a GPL license rather than Boost. Any chance we could convince you to change the license, Tim? (nudge, nudge, wink, wink) =)
i would consider to change the license for my fifo implementation to a boost-style license, if there would be a demand to use it for a boost.lockfree library ... however there might be some issues, though ... - i am not sure, whether the algorithm for memory reclamation is patented, i am not a lawyer, nor do i know of the legal status of american software patents in europe ... i somewhere read that maged michael's hazard pointers are patented, not sure about the pass-the-buck algorithm, that i used - if the algorithm is patented, i hope, the holder won't sue a developer of a gpl implementation - there are no boost/c++-style atomic operations, something like an implementation of n2427 ... ... as long as these issues are not resolved, i prefer a gpl license :) i would like to see a boost.lockfree library, with building blocks for lockfree containers (atomic primitives, aba-safe smart pointers, memory reclamation schemes, ...), several containers (queue, set, ...) and a lock-free reference counting smart pointer class (afaict, boost's smart pointer classes are not thread-safe) ... of course it would be wonderful to have both dynamic-sized and fized- sized containers, which can be used in hard real-time systems ... cheers, tim -- tim@klingt.org http://tim.klingt.org I must say I find television very educational. The minute somebody turns it on, I go to the library and read a good book. Groucho Marx

Tim Blechmann wrote:
if the algorithm is patented, i hope, the holder won't sue a developer of a gpl implementation Suing is expensive. More likely they would issue an injunction ESPECIALLY if it is gpl. In any case it is a very unpleasant experience ( I did some work as an intellectual property expert witness, and that was unpleasant enough!! Imagine being an actual party in litigation Ugh.)
i am not sure, whether the algorithm for memory reclamation is patented The one I used is based on a stack, and I am sure it is not patented, given that is it common practice. There is even an example in TC++PL. The stack implementation is textbook material. The drawback is that it requires, at the moment, DCAS (e.g. cmpxchg8b). This can be shortened to CAS with a few capacity assumptions, and using pointer compression.
there are no boost/c++-style atomic operations, something like an implementation of n2427 I think that boost only requires portability only between several compiler implementations, and not platforms. While we wait the several years for std::atomic to be ubiquitous, we could rely on either compiler built ins, assembler, etc. for the few platforms we would support at the outset. And just like std::atomic, not every platform will have lockfree implementation of everything. For example, DCAS on 64 bit Solaris is a locked based emulation. But portability is a moot issue if there is nothing to port...
with building blocks for lockfree containers (atomic primitives, aba-safe smart pointers, memory reclamation schemes, ...) These are the most important. Most researchers would prefer to use C++ instead of Java/C#, but they have to know assembler at the moment to get started. And they don't care if it is portable everywhere, just on machines they are likely to have available, which is of course Windows/Linux on Intel. (If they really need something that runs on platform X, then example code is available in the atomic_ops library (on Debian or something like that) which is ported everywhere)
afaict, boost's smart pointer classes are not thread-safe They are not. I have a version that is, as long as you don't care about weak pointers. It still needs a lot of work however. The interesting thing about a ref counted lock-free smart ptr is that it is possible to easily make any program lock free. Not that it will be more efficient than a lock based one, but for small objects it works great.
Lance -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tim Blechmann Sent: Saturday, April 19, 2008 6:00 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures. On Fri, 18 Apr 2008 13:55:35 -0700, Michael Dickey wrote:
Tim seems to have a great lock-free FIFO implementation, although I see that it uses a GPL license rather than Boost. Any chance we could convince you to change the license, Tim? (nudge, nudge, wink, wink) =)
i would consider to change the license for my fifo implementation to a boost-style license, if there would be a demand to use it for a boost.lockfree library ... however there might be some issues, though ... - i am not sure, whether the algorithm for memory reclamation is patented, i am not a lawyer, nor do i know of the legal status of american software patents in europe ... i somewhere read that maged michael's hazard pointers are patented, not sure about the pass-the-buck algorithm, that i used - if the algorithm is patented, i hope, the holder won't sue a developer of a gpl implementation - there are no boost/c++-style atomic operations, something like an implementation of n2427 ... ... as long as these issues are not resolved, i prefer a gpl license :) i would like to see a boost.lockfree library, with building blocks for lockfree containers (atomic primitives, aba-safe smart pointers, memory reclamation schemes, ...), several containers (queue, set, ...) and a lock-free reference counting smart pointer class (afaict, boost's smart pointer classes are not thread-safe) ... of course it would be wonderful to have both dynamic-sized and fized- sized containers, which can be used in hard real-time systems ... cheers, tim -- tim@klingt.org http://tim.klingt.org I must say I find television very educational. The minute somebody turns it on, I go to the library and read a good book. Groucho Marx _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

afaict, boost's smart pointer classes are not thread-safe They are not. I have a version that is, as long as you don't care about weak pointers. It still needs a lot of work however. The interesting thing about a ref counted lock-free smart ptr is that it is possible to easily make any program lock free. Not that it will be more efficient than a lock based one, but for small objects it works great.
is the code available somewhere? i had some trouble, implementing a lock- free boost-style shared_ptr class and started to write a smart pointer with a different api ... cheers, tim -- tim@klingt.org http://tim.klingt.org I had nothing to offer anybody except my own confusion Jack Kerouac

is the code available somewhere? i had some trouble, implementing a lock- free boost-style shared_ptr class and > started to write a smart
pointer with a different api It is not open sourced. I can make it available, but unfortunatley my laptop died this morning, so it may be a week or so until I am up and running again from home. I wrote up an entire description of how this works, but that is also on the dead computer. Basically the idea is that the pointer and the object both contain a version. If the versions do not match on the add_ref or release, nothing happens, and the pointer fails its operation, and tries again with fresh values. The pointee must use a special allocator, that only allocates objects that participate in this scheme (since it watnt to do bitwise compares on memory that may be reclaimed, and these fields must always be in the same place). This is not a big deal for a general shared_ptr, but my version so far just uses intrusive_ptr (since in my apps I would normally have used the intrusive_ptr anyway) So please give me a while to get back up and running. Laptop is at the shop now. Lance -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tim Blechmann Sent: Monday, April 21, 2008 11:07 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures.
afaict, boost's smart pointer classes are not thread-safe They are not. I have a version that is, as long as you don't care about weak pointers. It still needs a lot of work however. The interesting thing about a ref counted lock-free smart ptr is that it is possible to easily make any program lock free. Not that it will be more efficient than a lock based one, but for small objects it works great.
is the code available somewhere? i had some trouble, implementing a lock- free boost-style shared_ptr class and started to write a smart pointer with a different api ... cheers, tim -- tim@klingt.org http://tim.klingt.org I had nothing to offer anybody except my own confusion Jack Kerouac _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

I thought boost::shared_ptr was lock-free and thread safe on the major platforms supported by boost. I could be wrong though.

I thought boost::shared_ptr was lock-free and thread safe on the major
platforms supported by boost. I could > be wrong though. The reference counter is lock-free and thread safe. The shared_ptr itself is not shared_ptr<A> mutualA(new A); mutex mutualAlock; void * thread_func(void*){ shared_ptr<A> isolatedA; { //accessing mutualA requires lock lock(mutualAlock); isolatedA=mutualA; } //accessing isolatedA is fine //this only depends on the ref counter being thread safe vector<shared_ptr<A> > vA(100,isolatedA); } Also, it should be clear that this is unsafe: void * thread_func2(void*){ mutualA->doSomething(); } Since another thread is likely to do void * thread_func3(void*){ mutualA.reset(new A);//oops just deleted object thread_func2 was using } In the pointer I have, the above code would be linear_mutual_ptr<A> mutualA(new A); //for the shared instance void * thread_func_l(void*){ linear_ptr<A> isolatedA=mutualA;//for the isolated instance //isolatedA now has a copy, access like shared_ptr sans weak_ptr vector<linear_ptr<A> > vA(100,linearA); //and FWIW this works vector<linear_mutual_ptr<A> > vAm(100,mutualA); //but copies of linear_mutual_ptr are not as efficient //AND of course not every element may actually //points to the same object } void * thread_func2_l(void*){ //ensure that the object is going to hang around long enough //to complete the operation linear_ptr<A>(mutualA)->doSomething(); } linear_mutual_ptr also has a bool compare_and_swap(linear_ptr<T> const &oldval, linear_ptr<T> const &newval)volatile Operation, so that simple STM systems can be built using it, i.e. void * thread_func_l2(void*){ do{ linear_ptr<A> isolatedA=mutualA; linear_ptr<A> isolatedA2(new A(*isolatedA)); //not shown, but A's operator new has a lock-free allocator attached isolatedA2->modify(); if(mutualA.compare_and_swap(isolatedA,isolatedA2))return 0; }while(true); } Lance -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Patrick Twohig Sent: Monday, April 21, 2008 12:37 PM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures. I thought boost::shared_ptr was lock-free and thread safe on the major platforms supported by boost. I could be wrong though. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

On Mon, 21 Apr 2008 09:37:23 -0700, Patrick Twohig wrote:
I thought boost::shared_ptr was lock-free and thread safe on the major platforms supported by boost. I could be wrong though.
there are the following issues with boost::shared_ptr: thread 1 accesses object, thread 2 frees it object ... one would need a thread-local copy of the shared_ptr to dereference it or use hazard pointers/pass-the-buck -- shared_ptr<foo_t> foo (new foo_t()); void thread_1(void) { foo->bar(); } void thread_2(void) { foo.reset(); } -- copying problem if thread 1 updates the pointer entry of foo, then thread 2 updates both pointer and reference count entry, then thread 1 updates the reference count foo would point to the object managed by bar, using the reference count of baz ... one would need a way to validate both object and reference count, lance suggested a tag, which would add another indirection, or an atomic double- word read instruction -- shared_ptr<foo_t> foo (new foo_t()); shared_ptr<foo_t> bar (new foo_t()); shared_ptr<foo_t> baz (new foo_t()); void thread_1(void) { foo = bar; } void thread_2(void) { foo = baz; } -- cheers, tim -- tim@klingt.org http://tim.klingt.org You can play a shoestring if you're sincere John Coltrane

The docs state the pointers are "as close to raw pointers as possible" and that makes sense, heh. On Mon, Apr 21, 2008 at 2:20 PM, Tim Blechmann <tim@klingt.org> wrote:
On Mon, 21 Apr 2008 09:37:23 -0700, Patrick Twohig wrote:
I thought boost::shared_ptr was lock-free and thread safe on the major platforms supported by boost. I could be wrong though.
there are the following issues with boost::shared_ptr:
thread 1 accesses object, thread 2 frees it object ... one would need a thread-local copy of the shared_ptr to dereference it or use hazard pointers/pass-the-buck -- shared_ptr<foo_t> foo (new foo_t());
void thread_1(void) { foo->bar(); }
void thread_2(void) { foo.reset(); } --
copying problem if thread 1 updates the pointer entry of foo, then thread 2 updates both pointer and reference count entry, then thread 1 updates the reference count foo would point to the object managed by bar, using the reference count of baz ... one would need a way to validate both object and reference count, lance suggested a tag, which would add another indirection, or an atomic double- word read instruction -- shared_ptr<foo_t> foo (new foo_t()); shared_ptr<foo_t> bar (new foo_t()); shared_ptr<foo_t> baz (new foo_t());
void thread_1(void) { foo = bar; }
void thread_2(void) { foo = baz; } --
cheers, tim
-- tim@klingt.org http://tim.klingt.org
You can play a shoestring if you're sincere John Coltrane
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Now, without knowing anything about how ctor, dtor, push and pop are implemented, what can be done? Adding allocator as a parameter just like a std::container;
beside using an stl-style allocator it might makes sense to use a caching free-list ...
perhaps making a version that includes a monitor (so that pop blocks rather than returns false) and so forth.
i would be careful with that ... one would need to make sure, that the monitor/semaphore implementation is lock-free itself ... cheers, tim -- tim@klingt.org http://tim.klingt.org There's no such entity as "most people". These are generalities. All generalities are meaningless. You've got to pin it down to a specific person doing a specific thing at a specific time and space. William S. Burroughs

i would be careful with that ... one would need to make sure, that the monitor/semaphore implementation is lock-free itself ... I assume that you are referring to the push operation, and not the pop. The pop of course has to block somehow. Waking up a thread is for all practical purposes platform/scheduler specific, but there are ways to do this without blocking the signaler. One way I solved this was to use a pipe, with O_NONBLOCK set. There is an additional shared variable, that indicates that the queue is empty, otherwise nothing has to get written. poll() with a timeout is used on the pop side. (in my case I had to integrate with another middleware package that used file descriptors, so there were few other choices). Windows has events, which I believe are nonblocking already (and Windows has one of the best queueing mechanisms around -- perhaps it is already lock-free making boost.lockfree::queue a wrapper?). I'm not sure how I would do this with pthreads.
In any case all of this is independent of the actual implementation of the queue itself. It would just be nice to have a mechanism similar to that already found in Java. Its lock-free queue has a monitor version. -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tim Blechmann Sent: Saturday, April 19, 2008 6:19 AM To: boost@lists.boost.org Subject: Re: [boost] Nonlocking data structures.
Now, without knowing anything about how ctor, dtor, push and pop are implemented, what can be done? Adding allocator as a parameter just like a std::container;
beside using an stl-style allocator it might makes sense to use a caching free-list ...
perhaps making a version that includes a monitor (so that pop blocks rather than returns false) and so forth.
i would be careful with that ... one would need to make sure, that the monitor/semaphore implementation is lock-free itself ... cheers, tim -- tim@klingt.org http://tim.klingt.org There's no such entity as "most people". These are generalities. All generalities are meaningless. You've got to pin it down to a specific person doing a specific thing at a specific time and space. William S. Burroughs _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

Patrick Twohig wrote:
Another article talks about writing on the Xbox360, which is what I needed a queue for: http://msdn2.microsoft.com/en-us/library/bb310595(VS.85).aspx
"On Xbox 360 MemoryBarrier is defined as lwsync (lightweight sync), also available through the __lwsync intrinsic, which is defined in ppcintrinsics.h. __lwsync also serves as a compiler memory barrier, preventing rearranging of reads and writes by the compiler. The lwsync instruction is a memory barrier on Xbox 360 that synchronizes one processor core with the L2 cache. It guarantees that all writes before lwsync make it to the L2 cache before any writes that follow. It also guarantees that any reads that follow lwsync don't get older data from L2 than previous reads." Ich verstehe nur Bahnhof. The thing is that architecturally (on PPC/Power) lwsync doesn't prevent hardware from hoisting subsequent loads above preceding stores (or sinking preceding stores below subsequent loads) unless it's constrained by UP consistency or data/control dependencies. That's what makes lwsync "light weight" sync. To make the story short, if you're a going to code banking or some such on Xbox 360, I suggest you better use ordered load: sync; lr-sc loop; isync; ordered store: lwsync; store; sync; acquire load: load; "branch never taken"; isync // See B.2.3 Safe // Fetch (Book II) release store: lwsync; store; and decorate each *sync above with _ReadWriteBarrier thing just to be sure that compiler won't screw up. regards, alexander.

On Thu, 17 Apr 2008 15:23:09 -0700, Michael Dickey wrote:
I wrote a C++ lock free queue implementation recently based on the Michael & Scott algorithm, that uses APR's atomics. It's available at:
http://svn.atomiclabs.com/pion-common/trunk/common/include/pion/ PionLockFreeQueue.hpp
interesting implementation ... using pre-allocated fixed memory seems to be an interesting way around the cas2 requirement and the memory reclamation problem ... however if i understand the algorithm correctly, you're missing some memory barriers ... this might be the problem that you were writing about, having multiple producer/consumer threads ... cheers, tim -- tim@klingt.org http://tim.klingt.org Cheat your landlord if you can and must, but do not try to shortchange the Muse. It cannot be done. You can't fake quality any more than you can fake a good meal. William S. Burroughs

Thanks.. Although, it seems as though I still don't totally understand the algorithm correctly myself... This was my first attempt at lock-free programming, so all of this is still quite new to me. I had thought that the use of CAS operations is all that was required. Do you have a suggestion for how I could add this missing piece, or perhaps some references for me to read? Thanks, -Mike On Apr 17, 2008, at 3:52 PM, Tim Blechmann wrote:
On Thu, 17 Apr 2008 15:23:09 -0700, Michael Dickey wrote:
I wrote a C++ lock free queue implementation recently based on the Michael & Scott algorithm, that uses APR's atomics. It's available at:
http://svn.atomiclabs.com/pion-common/trunk/common/include/pion/ PionLockFreeQueue.hpp
interesting implementation ... using pre-allocated fixed memory seems to be an interesting way around the cas2 requirement and the memory reclamation problem ...
however if i understand the algorithm correctly, you're missing some memory barriers ... this might be the problem that you were writing about, having multiple producer/consumer threads ...
cheers, tim
-- tim@klingt.org http://tim.klingt.org
Cheat your landlord if you can and must, but do not try to shortchange the Muse. It cannot be done. You can't fake quality any more than you can fake a good meal. William S. Burroughs
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thanks.. Although, it seems as though I still don't totally understand the algorithm correctly myself... This was my first attempt at lock-free programming, so all of this is still quite new to me. I had thought that the use of CAS operations is all that was required. Do you have a suggestion for how I could add this missing piece, or perhaps some references for me to read?
well, most publication of algorithms assume a sequential memory model ... if the machine is using another memory model, memory barriers have to be placed ... on the x86/x86_64 architectures the cmpxchg instructions act as a memory barrier, but depending on the algorithm, some additional barriers need to be placed ... in my implementation of the ms-queue, i am using several barriers (possibly more than required by the algorithm), however it is working pretty well on multi-processor machines ... http://tim.klingt.org/git?p=nova.git;a=blob;f=libs/lockfree/fifo.hpp cheers, tim -- tim@klingt.org http://tim.klingt.org Just what the hell is the experimental tradition? Morton Feldman

Patrick Twohig <p-twohig <at> ieee.org> writes:
I've been looking through the docs and I was wondering if boost has any lock-free threadsafe containers. I know it's been brought up on the list before, but I was curious if anybody had a working implementation. If somebody is working on them, I'd like to lend a hand if possible. _______________________________________________
A well-timed query! I saw this on /. yesterday: http://freshmeat.net/projects/tstl/ License appears to be more restrictive than boost, mind you. Also, in no way am I advocating it (haven't yet unzipped the archive myself), just posting a relevant link. Best, Amit
participants (10)
-
Alexander Terekhov
-
Amit
-
Cory Nelson
-
Giovanni Piero Deretta
-
Lance.Diduck@ubs.com
-
Mathias Gaunard
-
Michael Dickey
-
Patrick Twohig
-
patrick.twohig
-
Tim Blechmann