Date: Fri, 5 May 2006 14:09:33 -0600 From: Ray Whitmer
Subject: Re: [Boost-users] Thread-safe smart pointers, hashtables without locking To: boost-users@lists.boost.org Message-ID: <0D5E3BD4-9A5D-4CE2-9040-35500773C190@personallegal.net> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed The algorithm makes sense, and if all these claims hold then it should work, I think. I'm still concerned about step (2), however, if only because of the assumption that what you write as a load/ store in C++ maps to a load/store in ML. Will it? Always? I ask as a non-rhetorical question: does the C++ standard *guarantee* it? Because if it doesn't then it seems like a potential problem, but maybe I'm being too careful.
I'd also say the idea of moving a pointer out of the way, and then using the flip-gate-count-thing to wait for any 'potentially-using' readers to finish, definitely has 'potential'. You would need to show detailed code to really be sure. And not to us - to the guys on comp.programming.threads. If I could answer in the affirmative, I could prove it works and not
have to talk to anyone. I can make a bunch of lesser claims, that of course fail to add up to the strong claim but make me feel better:
3. I have looked at the output of many C compilers and have never seen a pointer assignment take multiple instructions. The exception might be if someone used a poorly-written memcpy-like operations to transfer the contents of a structure byte by byte. A well-written one takes the respective alignments into account and transfers aligned 32-bit quantities one-by-one. We are talking about code in a controlled Hashtable class, it would have to be the "C" compiler choosing to transfer a class consisting of a single address field using something other than a single instruction. It doesn't sound logical to me, but it wouldn't surprise me completely to see it happen, but I would report it as a compiler bug and expect it to be addressed even though it is not in the standard.
As long as the address of the pointer is properly aligned, it will be atomic. The language doesn't guarantee that (the language doesn't talk about atomic or threads at all - at least not yet), but the processors + compilers tend to guarantee it. HOWEVER! *even if it is atomic*, on a multi-cpu machine each cpu might see memory differently - you need memory barriers to ensure the other cpus see your new pointer value correctly. Think of cpus and memory like harddrive controllers that reorder reads and writes to lessen seeks - the cpu does the same sort of thing with memory - and in doing so, it may reorder the write of your pointer to 'happen' before you expected (because it was near other memory it was writing) and/or read the pointer later than expected (because it optimized other reads into a 'bundle' before the read of your pointer). Anyhow, if you haven't already, take a look at memory barriers, and, as a canonical example, the broken 'double checked locking pattern'. You can't do lock free without understanding this stuff. 4. These issues have to be addressed with the multi-processor cores
becoming more prevalent because they are basic to lock-free processing, or can C++ accept significant inferiority to Java with respect to lock-free multiprocessing?
see above. Much of it doesn't come for free. For what it is worth, if anyone else thinks the solution is worth a
generic contribution, or is of the opposite opinion that it would not be generic enough, I would be interested in hearing. I suppose this sort of question should go to the developers list. I think it would be much more difficult to implement a table of shared_ptr than it is for intrusive_ptr, because assigning a shared_ptr is much less likely to be uninterrupted, leaving other threads with a broken visible state. There is clearly more detail to discuss as well. Ray Whitmer
Your wording of 'much less likely' is scary, and I think similar wording made James apprehensive as well. There is no grey - it is black or white, thread safe or not. And you are correct, you can't make { reading the location of a shared_ptr + incrementing its ref-count } atomic, it least not on most processors. I'm not sure of the exact implementation of intrusive_ptr, but assuming it is like most, you can make ip1 = ip2; atomic IF ip2 is 'stable', ie guaranteed to not be changing during the assignment. ie ip1 can be read and/or written by other threads "while" the assignment is happening, and you will either get the before state of ip1, or the after state. I don't think the boost implementation of intrusive_ptr's operator=() is atomic, but it could be made to be. Tony