
flodin@cs.umu.se wrote:
It sounds reasonable that the kind of lock described will perform badly in high-contention situations. But what evidence do you have that there is any probability of high contention, especially when, as you say, the locks are short-lived? These locks aren't taken every time the smart pointer is accessed; they're only taken when a copy is made. So to get the high contention you speak of, you would have to have two threads repeatedly making copies of the "same" smart pointer. From where I stand, this seems sufficiently rare that the extra performance gain in the common cases is worth it.
[these comments are specific to the win32 spinlock discussed] I have two specific concerns. 1) Personally, I value predictable performance. I would prefer slight performance degradation (Note, however, that I am not examining the actual numbers.) to the case where I may have an occasional very long wait. (Specific example: If a graphically intensive program is aiming for 30 frames per second, that is only about 30ms per frame. It would be very unfortunate if, even occasionally, a single smart pointer copy accounted for 1/3 of the time it took to render a frame.) 2) In "thundering herd" situations, contention is unavoidable, and performance may be paramount. The possibility of a single thread halting many other threads for 10ms on a machine with many processors is disturbing. (Specific example: In a case where thread is producing work for many consumer threads on a multiprocessor machine, and obtaining the work involves copying a smart pointer, it is possible that some amount of threads will forced to sleep every single time, which might be a substantial percentage of total job wall clock time.) In any case, I feel fairly strongly that sleeping is not the right thing to do. Despite the weakness in appeal to authority, I will note that I have seen Butenhof make the same assertion on Usenet, surely in a manner more eloquent than mine. I previously neglected to suggest alternatives. 1) A generally sound 8-byte mutex could be used. I am not sure how bad this would be in terms of storage efficiency. It may be possible to implement it using less storage, but I have no specific information on this. 2) I've seen a shared_ptr count implementation that used InterlockedIncrement instead of the lwm code. I have not examined this in detail; however it seems that an approach like this is better in all respects than trying to manage locks if the only need is to maintain a count (which is what many mutex primatives are doing anyway). Is there some reason this cannot be used? What do you think? Aaron W. LaFramboise