Re: [Boost-users] Thread-safe smart pointers, hashtables without locking

5 May 2006

      ...
The algorithm makes sense, and if all these claims hold then it  
should work, I think. I'm still concerned about step (2), however,  
if only because of the assumption that what you write as a load/ 
store in C++ maps to a load/store in ML. Will it? Always? I ask as  
a non-rhetorical question: does the C++ standard *guarantee* it?  
Because if it doesn't then it seems like a potential problem, but  
maybe I'm being too careful.
If I could answer in the affirmative, I could prove it works and not  
have to talk to anyone.  I can make a bunch of lesser claims, that of  
course fail to add up to the strong claim but make me feel better:

1. For years, people have written and relied on C/C++ characteristics  
that apparently were not in the standard until later if at all.  If  
the popular platforms support it, it takes a strong reason to justify  
a violation of it.

2. Java, on any platform, would seem to strongly depend upon a  
reference assignment being accomplished all-or-nothing as a machine  
assignment.  The alternative would be threads that see partial  
references being assigned in different threads and all the safety of  
references is out the window.  This is such a common operation, it  
would be difficult to believe that there is thread synchronization  
going on around every pointer/reference assignment.

3. I have looked at the output of many C compilers and have never  
seen a pointer assignment take multiple instructions.  The exception  
might be if someone used a poorly-written memcpy-like operations to  
transfer the contents of a structure byte by byte.  A well-written  
one takes the respective alignments into account and transfers  
aligned 32-bit quantities one-by-one.  We are talking about code in a  
controlled Hashtable class, it would have to be the "C" compiler  
choosing to transfer a class consisting of a single address field  
using something other than a single instruction.  It doesn't sound  
logical to me, but it wouldn't surprise me completely to see it  
happen, but I would report it as a compiler bug and expect it to be  
addressed even though it is not in the standard.

4. These issues have to be addressed with the multi-processor cores  
becoming more prevalent because they are basic to lock-free  
processing, or can C++ accept significant inferiority to Java with  
respect to lock-free multiprocessing?

Sorry that my arguments in this case frequently refer to Java.
...
Have you read any of the literature on lock-free data structures,  
btw? There was articles in the October and December '04 C/C++ User  
Journal about them.
I have read a number of articles on the subject, several specifically  
on the topic of lock-free hashtables, several with implementations  
they claim to be proven to be correct.  My own first attempt is far  
from a proven-correct implementation.

Many seem to arrive at similar conclusions, although each has its own  
slightly-different take on the subject and quite different mechanisms.

They are also very often theoretical and build their proofs of  
completeness on features not apparently explicitly present in "C"  
standards.  I certainly have yet to find an implementation I can just  
plug in and use.

They often resort to garbage collection to solve the hard problems  
(even if they are working in C) or if they solve the deallocation  
problem, they only do so for the main array, not for the entries in  
the hashtable itself.
...
I realize that I haven't answered your original question about  
atomic_count, btw, and I'm sorry: the reason is simply that I don't  
know!
Thanks for the comments anyway, which were thought-provoking.

For what it is worth, if anyone else thinks the solution is worth a  
generic contribution, or is of the opposite opinion that it would not  
be generic enough, I would be interested in hearing. I suppose this  
sort of question should go to the developers list. I think it would  
be much more difficult to implement a table of shared_ptr than it is  
for intrusive_ptr, because assigning a shared_ptr is much less likely  
to be uninterrupted, leaving other threads with a broken visible  
state.  There is clearly more detail to discuss as well.
Ray Whitmer