shared_ptr: policy for usage_count lock

I apologize if my observation has already been discussed and course agreed. When using Boost with multi-threaded programs, shared_ptr synchronizes access to internal usage count variable. I understand the rationale for guarding the shared resource as default behavior in MT environment, however I think users might appreciate the ability to override it. Currently, users either ignore, or are unaware of, performance cost imposed by locking, or choose, with all it's implications, an option of using non-owning observer of an object owned by shared_ptr. During tests involving manipulations of shared_ptr variables similar to those performed by algorithms on container elements – their copying and removal – a significant overhead was measured. During tests, care was taken to assure that timing did not include creation of new objects managed by shared_ptrs nor their destruction. One way to customize the shared_ptr without breaking existing code would be to introduce an optional template parameter stating locking policy. Attached you will find test code used on a win32 platform that produced the following results on Intel Pentium 4 2.66Mhz single processor machine. Each line represents separate testing iteration, so averages can be calculated: BOOST_HAS_THREADS is: TRUE Elapsed time: 845605 microseconds Elapsed time: 817763 microseconds Elapsed time: 802840 microseconds Elapsed time: 790676 microseconds Elapsed time: 790839 microseconds Elapsed time: 793012 microseconds Elapsed time: 790474 microseconds Elapsed time: 788396 microseconds Elapsed time: 794611 microseconds Elapsed time: 792640 microseconds Press any key BOOST_HAS_THREADS is: FALSE Elapsed time: 87914 microseconds Elapsed time: 87753 microseconds Elapsed time: 88958 microseconds Elapsed time: 86546 microseconds Elapsed time: 86437 microseconds Elapsed time: 86520 microseconds Elapsed time: 86433 microseconds Elapsed time: 88519 microseconds Elapsed time: 86420 microseconds Elapsed time: 87535 microseconds Press any key The time overhead mentioned above is meant to serve as an example from a particular win32 platform. Results will vary across platforms and hardware configurations. Regards, Slawomir Lisznianski; [ www.rhapsodia.org ] /* * Copyright (C) 2003-2004 * Slawomir Lisznianski <slisznianski@asyncnet.com> * * Permission to use, copy, modify, distribute and sell this software * and its documentation for any purpose is hereby granted without fee, * provided that the above copyright notice appear in all copies and * that both that copyright notice and this permission notice appear * in supporting documentation. Slawomir Lisznianski makes no * representations about the suitability of this software for any * purpose. It is provided "as is" without express or implied warranty. * */ #include <string> #include <iostream> #include <stdexcept> #include <boost/shared_ptr.hpp> #include <Windows.h> // VC++ 6 library doesn't define it. // std::ostream& operator<<(std::ostream& os, __int64 i ) { char buf[20]; sprintf(buf,"%I64d", i ); os << buf; return os; } struct Timer { Timer(std::ostream& out) : _M_out(out) { if (!::QueryPerformanceFrequency(&_M_frequency)) throw std::runtime_error("Unable to retrieve frequency of the high-resolution performance counter."); if (!::QueryPerformanceCounter(&_M_startEvent)) throw std::runtime_error("Unable to retrieve value of the high-resolution performance counter."); } ~Timer() { if (!::QueryPerformanceCounter(&_M_endEvent)) _M_out << "Unable to retrieve value of the high-resolution performance counter."; else { LONGLONG elapsedTicks__ = _M_endEvent.QuadPart - _M_startEvent.QuadPart; double elapsedTime__ = 0; _M_out << "Elapsed time: "; if (_M_frequency.QuadPart > 1000000) { elapsedTime__ = (elapsedTicks__ / (_M_frequency.QuadPart / 1000000)); _M_out << elapsedTime__ << " microseconds" << std::endl; } else if (_M_frequency.QuadPart > 1000) { elapsedTime__ = (elapsedTicks__ / (_M_frequency.QuadPart / 1000)); _M_out << elapsedTime__ << " miliseconds" << std::endl; } else { _M_out << elapsedTicks__ << " ticks (tick: 1/" << _M_frequency.QuadPart << " of second)" << std::endl; } } } std::ostream& _M_out; LARGE_INTEGER _M_startEvent; LARGE_INTEGER _M_endEvent; LARGE_INTEGER _M_frequency; }; void run() { boost::shared_ptr<int> ptrA__(new int(0)), ptrB__, ptrC__; Timer timer__(std::cout); for (int i=0; i<4000000; ++i) { ptrB__ = ptrA__; ptrC__ = ptrB__; } } int main(int, char**) { try{ std::cout << "BOOST_HAS_THREADS is: "; #if !defined(BOOST_HAS_THREADS) std::cout << "FALSE" << std::endl; #else std::cout << "TRUE" << std::endl; #endif for (int i=0; i<10; ++i) run(); } catch (std::exception& e) { std::cerr << e.what() << std::endl; } return 0; }

Slawomir Lisznianski wrote: [...]
During tests involving manipulations of shared_ptr variables similar to those performed by algorithms on container elements – their copying and removal – a significant overhead was measured. During tests, care was taken to assure that timing did not include creation of new objects managed by shared_ptrs nor their destruction.
One way to customize the shared_ptr without breaking existing code would be to introduce an optional template parameter stating locking policy.
Attached you will find test code used on a win32 platform that produced the following results on Intel Pentium 4 2.66Mhz single processor machine. Each line represents separate testing iteration, so averages can be calculated:
[...]
std::ostream& _M_out; LARGE_INTEGER _M_startEvent; LARGE_INTEGER _M_endEvent; LARGE_INTEGER _M_frequency;
Please note that identifiers that start with _M (underscore followed by an uppercase letter) are reserved by the implementation, as are identifiers containing a douible underscore.
};
void run() { boost::shared_ptr<int> ptrA__(new int(0)), ptrB__, ptrC__; Timer timer__(std::cout); for (int i=0; i<4000000; ++i) { ptrB__ = ptrA__; ptrC__ = ptrB__; } }
Thank you for the test. I was able to confirm your results on an AMD Athlon 1.4. However, you have to agree that your test code isn't very realistic, or to be precise, it's very unrealistic. ;-) I was able to cut both single- and multithreaded times to 20ms by replacing shared_count::operator= as shown below: shared_count & operator= (shared_count const & r) // nothrow { sp_counted_base * tmp = r.pi_; if(tmp != pi_) { if(tmp != 0) tmp->add_ref_copy(); if(pi_ != 0) pi_->release(); pi_ = tmp; } return *this; } That's because you are measuring a tight cycle of no-ops. While it would be trivial to modify the test to avoid this particular optimization, I'd appreciate it if you can produce a test sample that is derived from a real code base that uses shared_ptr extensively. That said, your test, when rerun with the "next release shared_count" (proof of concept available at http://www.pdimov.com/cpp/shared_count_x86_exp2.hpp ) produces BOOST_HAS_THREADS is: TRUE Elapsed time: 254150 microseconds Elapsed time: 225076 microseconds Elapsed time: 225000 microseconds Elapsed time: 224875 microseconds Elapsed time: 226152 microseconds Elapsed time: 224947 microseconds Elapsed time: 225070 microseconds Elapsed time: 229008 microseconds Elapsed time: 227057 microseconds Elapsed time: 224856 microseconds Press any key to continue I find this (~3x instead of 10x) slightly less alarming. ;-)

Peter Dimov wrote:
Slawomir Lisznianski wrote:
[...]
Please note that identifiers that start with _M (underscore followed by an uppercase letter) are reserved by the implementation, as are identifiers containing a douible underscore.
Thanks for noting :-) I will refrain from using them.
Thank you for the test. I was able to confirm your results on an AMD Athlon 1.4. However, you have to agree that your test code isn't very realistic, or to be precise, it's very unrealistic. ;-)
Quite right, However, performance test scenarios are usually an exaggeration of reality ;-) Library code will be re-used by developers of various skills, and unintentional misuses are possible. Tests, such as above, are demonstrating the "worst case".
I was able to cut both single- and multithreaded times to 20ms by replacing shared_count::operator= as shown below:
shared_count & operator= (shared_count const & r) // nothrow { sp_counted_base * tmp = r.pi_;
if(tmp != pi_) { if(tmp != 0) tmp->add_ref_copy(); if(pi_ != 0) pi_->release(); pi_ = tmp; }
return *this; }
I see. Thanks for the tip. I will use it.
That said, your test, when rerun with the "next release shared_count" (proof of concept available at
Do we sacrifice platforms that do not support atomic instructions then? ;-) I was faintly aiming to show a bigger picture, hence my proposal for locking policy. IMHO, overriding operator= is a valuable alternative, but puts an ado on users.
I find this (~3x instead of 10x) slightly less alarming. ;-)
I agree ;-) Slawomir Lisznianski; [ www.rhapsodia.org ]

On Tue, 10 Feb 2004 11:10:22 -0600 Slawomir Lisznianski <slisznianski@asyncnet.com> wrote:
Do we sacrifice platforms that do not support atomic instructions then? ;-) I was faintly aiming to show a bigger picture, hence my proposal for locking policy. IMHO, overriding operator= is a valuable alternative, but puts an ado on users.
I agree that a template parameter describing the locking policy is a much better solution. It provides all users of shared_ptr with what they want, and it does not cost anything for the times when a lock is not desired. -- Jody Hagins Consultant, n.: An ordinary man a long way from home.

At 12:32 PM 2/10/2004, Jody Hagins wrote:
On Tue, 10 Feb 2004 11:10:22 -0600 Slawomir Lisznianski <slisznianski@asyncnet.com> wrote:
Do we sacrifice platforms that do not support atomic instructions then? ;-) I was faintly aiming to show a bigger picture, hence my proposal for locking policy. IMHO, overriding operator= is a valuable alternative, but puts an ado on users.
I agree that a template parameter describing the locking policy is a much better solution. It provides all users of shared_ptr with what they want, and it does not cost anything for the times when a lock is not desired.
One difficulty with a locking-policy template parameter is that it changes the shared_ptr's type. This is a problem with passing the shared_ptr to a third-party or other library component which expects a different type. --Beman
participants (4)
-
Beman Dawes
-
Jody Hagins
-
Peter Dimov
-
Slawomir Lisznianski