
Peter Dimov wrote:
The smart pointer itself (including all temporary copies and the creation from a weak_ptr) uses atomic reference counting. Both are scalability issues (mostly the synchronization) and
No, atomic reference counting is typically not a scalability issue. "Scalability issue" means that K operations take more than K * (time for a single operation). A class-wide lock could do that, atomic increments/decrements usually do not.
That's why I said "mostly the synchronization" and I meant scaling with multiple threads and multiple processors.
exceptionally heavy in simple absolute runtime cost (as much as 50 times the cost of a simple object copy and more, depending on the CPU and system).
I know of no such CPU or system.
I didn't want to go into too much detail here either and therefor chose a simple combined figure for both operations. The allocation is of course more costly if it involves locking a process-wide mutex, but subsequent atomic increments roughly take 100 cycles on a Core 2 Duo and I've seen reports of higher delays with older Xeons on dual socket systems. That is all not very surprising, considering that atomic operations are essentially orthogonal to CPU designers' strategies of achieving a high instruction throughput. Regards Timmo Stange