
hi all, i adapted some of my code to make it available under the boost license, which might be the starting point for a boost.lockfree library ... the main container is a lockfree fifo, the code is not really structured, yet, nor documented ... also, it is only tested on linux/gcc ... the a tarball is available at: http://tim.klingt.org/boost_lockfree.tar.bz2 i would be curious, if some people might want to work together in order to implement a boost-style lockfree library ... cheers, tim -- tim@klingt.org http://tim.klingt.org Cheat your landlord if you can and must, but do not try to shortchange the Muse. It cannot be done. You can't fake quality any more than you can fake a good meal. William S. Burroughs

i would be curious, if some people might want to work together in order to implement a boost-style lockfree >library ... Yes I would be interested. I have some (unliscensed) code, but I can make that available on a case by case basis for now. If you want a look plz let me know.
Our versions have basically the same stuff, however I note these differences: 1. Yours seems to be more hardware-portable (in that it accounts for PIC, Apple, and such) 2. Yours seems to issue a few more explicit barriers. 3. You have better branch prediction logic 4. you have better methods of reducing false sharing Mine has these: 1. the FIFO free list allocator can also be used in STL containers 2. a LIFO 3. the "tagged_ptr" is broken out, such that you could make intrusive containers out of it (a la boost::interprocess) 4. There are hooks to eventually make fancy pointers work with "tagged_ptr" (I had offset_ptr in mind, from boost::interprocess) 5. I have a "linear ptr" essentially a lockfree shared_ptr. 6. Mine compiles on MSVC as well as GCC 7. I have some unit tests. Nothing like they should be, but it's a start. Lance -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tim Blechmann Sent: Monday, April 28, 2008 2:26 PM To: boost@lists.boost.org Subject: [boost] boost.lockfree proposal hi all, i adapted some of my code to make it available under the boost license, which might be the starting point for a boost.lockfree library ... the main container is a lockfree fifo, the code is not really structured, yet, nor documented ... also, it is only tested on linux/gcc ... the a tarball is available at: http://tim.klingt.org/boost_lockfree.tar.bz2 i would be curious, if some people might want to work together in order to implement a boost-style lockfree library ... cheers, tim -- tim@klingt.org http://tim.klingt.org Cheat your landlord if you can and must, but do not try to shortchange the Muse. It cannot be done. You can't fake quality any more than you can fake a good meal. William S. Burroughs _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

hi lance,
I have some (unliscensed) code, but I can make that available on a case by case basis for now. If you want a look plz let me know.
i would be quite interested in it!
1. the FIFO free list allocator can also be used in STL containers
interesting, i'd like to compare it with my caching_freelist implementation ...
3. the "tagged_ptr" is broken out, such that you could make intrusive containers out of it (a la boost::interprocess)
i'd be curious to compare it with my 'tagged_ptr' ... cheers, tim -- tim@klingt.org http://tim.klingt.org The aim of education is the knowledge, not of facts, but of values William S. Burroughs

On Mon, Apr 28, 2008 at 11:25 AM, Tim Blechmann <tim@klingt.org> wrote:
hi all,
i would be curious, if some people might want to work together in order to implement a boost-style lockfree library ...
I'd be interested in contributing. This seems well designed already, but I see some improvements: I want intrusive versions of stack/queue. The freelist could also be made to use the stack. The allocator should put blocks on cache-line boundaries. A simple freelist allocator that grabs blocks in pages would be fantastic in a lot of cases. The algo works inefficiently on IA64 because it copies the pointer when IA64 only needs to copy the ABA counter (it has cmp8xchg16b). This can be reworked to have a tagged_ptr<>::cmp_type instead of comparing with another tagged_ptr. CAS on x86 will give you the old value if it fails. CAS should be changed to modify the compare arg, so you can write a loop without constantly loading the destination: cmp_type cmp = dest; do { ... } while(!CAS(dest, exch, cmp)); -- Cory Nelson

I'd be interested in contributing. This seems well designed already, but I see some improvements:
great!
I want intrusive versions of stack/queue. The freelist could also be made to use the stack.
i once tried to adapt the queue algorithm to an intrusive version, it is not really straight forward, though ... when you dequeue a node from the queue, it may still be referred to by other threads, thus you are not able (in this version not allowed!) to free the node ...
The allocator should put blocks on cache-line boundaries.
yes, that probably makes sense ...
The algo works inefficiently on IA64 because it copies the pointer when IA64 only needs to copy the ABA counter (it has cmp8xchg16b). This can be reworked to have a tagged_ptr<>::cmp_type instead of comparing with another tagged_ptr.
oh, nice ...
CAS on x86 will give you the old value if it fails. CAS should be changed to modify the compare arg, so you can write a loop without constantly loading the destination:
cmp_type cmp = dest; do { ... } while(!CAS(dest, exch, cmp));
good point ... btw, i have set up a git repository, git://tim.klingt.org/boost_lockfree.git cheers, tim -- tim@klingt.org http://tim.klingt.org The aim of education is the knowledge, not of facts, but of values William S. Burroughs

I want intrusive versions of stack/queue. The freelist could also be made to use the stack. In my design I did make the free list allocator use the stack. The stack assumes that its nodes are intrusive style. However, I have never seen published an intrusive queue. The Micheal Scott queue relies on dummy nodes, where the last node dequed serves as the new dummy node, the previous one being freed. This makes it difficult (or impossible) to make the nodes intrusive.
A simple freelist allocator that grabs blocks in pages would be fantastic in a lot of cases. The one I have does this. The drawback is that it make pointer compression difficult. Compression is a good way to solve many ABA problems on architectures where CAS2 is not available. A compressing allocator/pointer pair would be required for any hope of portability. Of course, noncompressing version should be available for platofrm that could support it (such as Intel) The algo works inefficiently on IA64 because it copies the pointer when IA64 only needs to copy the ABA counter >(it has cmp8xchg16b). I have not seen an algorithm published that makes use of cmp8xchg16b. Most algs I have seen want to swap both the pointer and the count, and not just the pointer if the count is different.
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Cory Nelson Sent: Monday, April 28, 2008 5:06 PM To: boost@lists.boost.org Subject: Re: [boost] boost.lockfree proposal On Mon, Apr 28, 2008 at 11:25 AM, Tim Blechmann <tim@klingt.org> wrote:
hi all,
i would be curious, if some people might want to work together in order to implement a boost-style lockfree library ...
I'd be interested in contributing. This seems well designed already, but I see some improvements: I want intrusive versions of stack/queue. The freelist could also be made to use the stack. The allocator should put blocks on cache-line boundaries. A simple freelist allocator that grabs blocks in pages would be fantastic in a lot of cases. The algo works inefficiently on IA64 because it copies the pointer when IA64 only needs to copy the ABA counter (it has cmp8xchg16b). This can be reworked to have a tagged_ptr<>::cmp_type instead of comparing with another tagged_ptr. CAS on x86 will give you the old value if it fails. CAS should be changed to modify the compare arg, so you can write a loop without constantly loading the destination: cmp_type cmp = dest; do { ... } while(!CAS(dest, exch, cmp)); -- Cory Nelson _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.

I'd also be interested in contributing. I had some stuff I had started to hack out b/c I wasn't sure if you were planning on relicensing your code but I guess I'll scrap that and hack away at this.

On Mon, Apr 28, 2008 at 3:26 PM, <Lance.Diduck@ubs.com> wrote:
I want intrusive versions of stack/queue. The freelist could also be made to use the stack. In my design I did make the free list allocator use the stack. The stack assumes that its nodes are intrusive style. However, I have never seen published an intrusive queue. The Micheal Scott queue relies on dummy nodes, where the last node dequed serves as the new dummy node, the previous one being freed. This makes it difficult (or impossible) to make the nodes intrusive.
Indeed. I have an intrusive version that is made mainly to organize the code, it would not make much sense to use elsewhere.
A simple freelist allocator that grabs blocks in pages would be fantastic in a lot of cases. The one I have does this. The drawback is that it make pointer compression difficult. Compression is a good way to solve many ABA problems on architectures where CAS2 is not available. A compressing allocator/pointer pair would be required for any hope of portability. Of course, noncompressing version should be available for platofrm that could support it (such as Intel)
Yes, I actually combine the pointer and an ABA counter into one 64-bit value for legacy AMD systems that don't support CAS2. I have only tested this on Windows - I have no idea how other OSes lay out their memory. This also introduces the idea of "preparing" a pointer for CAS, though the algorithm code can still be generic enough for all archs. http://svn.int64.org/svnroot/int64/snips/lockfree/tagged_ptr.x64.hpp http://svn.int64.org/svnroot/int64/snips/lockfree/basic_iqueue.hpp (caveat, the code in that dir is actually outdated, I never got around to finishing it. would need some work to get good.)
The algo works inefficiently on IA64 because it copies the pointer when IA64 only needs to copy the ABA counter >(it has cmp8xchg16b). I have not seen an algorithm published that makes use of cmp8xchg16b. Most algs I have seen want to swap both the pointer and the count, and not just the pointer if the count is different.
The idea is that you compare just the counter, then exchange the whole thing. As far as I know there is no cmpxchg16b on IA64, so you have to do it this way or not at all. -- Cory Nelson

Compression is a good way to solve many ABA problems on architectures where CAS2 is not available. A compressing allocator/pointer pair would be required for any hope of portability. Of course, noncompressing version should be available for platofrm that could support it (such as Intel)
Yes, I actually combine the pointer and an ABA counter into one 64-bit value for legacy AMD systems that don't support CAS2. I have only tested this on Windows - I have no idea how other OSes lay out their memory. This also introduces the idea of "preparing" a pointer for CAS, though the algorithm code can still be generic enough for all archs.
http://svn.int64.org/svnroot/int64/snips/lockfree/tagged_ptr.x64.hpp
i have just implemented a compressed pointer for the x86_64 architecture, tested on one linux machine ... http://tim.klingt.org/git?p=boost_lockfree.git;a=blob;f=boost/lockfree/ tagged_ptr_ptrcompression.hpp cheers, tim -- tim@klingt.org http://tim.klingt.org Silence is only frightening to people who are compulsively verbalizing. William S. Burroughs

Hi Tim, This is a lifesaver for me! I wanted to add some lock-free code to a project I'm working on to speed up some critical pieces. Unfortunately, this is all new to me, and I've found it very hard to get something working reliably due to the lack of good, open and portable primitives. Now, I'm throwing out my code and using yours instead! =) One class I put together that others might be interested in is a simple allocator that uses multiple boost::pool<> objects for memory blocks of various sizes, and combines these with caching free lists that makes it lock-free when there is enough memory already available. Not sure how this compares to the lock-free allocation algorithms out there, but it seemed like a simple approach that works perfectly for what I need it for.. http://websvn.atomiclabs.com/filedetails.php?repname=pion-common&path=%2Ftrunk%2Fcommon%2Finclude%2Fpion%2FPionPoolAllocator.hpp (you should be able to easily strip out the "Pion" stuff without any adverse affect) Glad to see lots of support for this library here! Take care, -Mike On Apr 28, 2008, at 11:25 AM, Tim Blechmann wrote:
hi all,
i adapted some of my code to make it available under the boost license, which might be the starting point for a boost.lockfree library ...
the main container is a lockfree fifo, the code is not really structured, yet, nor documented ... also, it is only tested on linux/gcc ...
the a tarball is available at: http://tim.klingt.org/boost_lockfree.tar.bz2
i would be curious, if some people might want to work together in order to implement a boost-style lockfree library ...
cheers, tim
-- tim@klingt.org http://tim.klingt.org
Cheat your landlord if you can and must, but do not try to shortchange the Muse. It cannot be done. You can't fake quality any more than you can fake a good meal. William S. Burroughs
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Tim Blechmann wrote:
i adapted some of my code to make it available under the boost license, which might be the starting point for a boost.lockfree library ...
Hi Tim, I would love to see some benchmark results comparing your implementation with a "traditional" fifo plus (efficient) mutex implementation. Do you have anything like this? Phil.

hi phil,
I would love to see some benchmark results comparing your implementation with a "traditional" fifo plus (efficient) mutex implementation. Do you have anything like this?
i haven't done any benchmarks concerning my fifo implementation ... however it would probably make sense to write a blocking equivalent (the two-lock concurrent queue described in the same paper of michael and scott) with both spin locks and traditional mutexes ... both average-case and worst-case execution times should be benchmarked, though ... cheers, tim -- tim@klingt.org http://tim.klingt.org The only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time, the ones who never yawn or say a commonplace thing, but burn, burn, burn, like fabulous yellow roman candles exploding like spiders across the stars and in the middle you see the blue centerlight pop and everybody goes "Awww! Jack Kerouac
participants (6)
-
Cory Nelson
-
Lance.Diduck@ubs.com
-
Michael Dickey
-
Patrick Twohig
-
Phil Endecott
-
Tim Blechmann