Re: [boost] Proposal to add smart_ptr to the boost library

"David Abrahams" <dave@boost-consulting.com> wrote in message news:<ufyn5wggo.fsf@boost-consulting.com>...
"David Maisonave" <dmaisonave@commvault.com> writes:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:<874q3ld907.fsf@boost-consulting.com>...
"David Maisonave" <boost@axter.com> writes:
"Daniel Wallin" <dalwan01@student.umu.se> wrote in message news:<drkmls$kb5$1@sea.gmane.org>...
It's in the FAQ:
http://www.boost.org/libs/smart_ptr/shared_ptr.htm#FAQ
Q. Why doesn't shared_ptr use a linked list implementation?
A. A linked list implementation does not offer enough advantages to offset the added cost of an extra pointer. See timings page. In addition, it is expensive to make a linked list implementation thread safe.
You can avoid having to make the implementation thread safe by making the pointee thread safe.
One of us here understands nothing about the problem. I don't know that much about threading, but I think I have a grip on this issue at least. As I understand the problem, if two neighboring shared_ptr's in a reference-linked chain are destroyed at the same time, they will be modifying the same pointer values simultaneously -- without a lock in your case. I don't see how making the pointee threadsafe is going to help one bit.
IMHO, you don't fully understand my propose solution. Please look at
the current smart pointer locking method: http://code.axter.com/smart_ptr.h By using intrusive lock logic, you
can lock the pointee, and thereby locking all the shared_ptr objects. Here's the smart pointer destructor: inline ~smart_ptr() throw() { m_ownership_policy.destructor_lock_policy(m_type); CHECKING_POLICY::before_release(m_type); m_ownership_policy.release(m_type, m_clone_fct, m_ownership_policy); CHECKING_POLICY::after_release(m_type); }
There's similar logic in constructor and assignment operator. This should work on all three main types of reference policies, to include reference-link. Do you understand intrusive logic? You need
to fully understand how intrusive logic works, in order to understand the method.
It sounds like you're saying that, essentially, all the shared_ptrs that point to the same object share a single mutex, and in your case, that mutex happens to be embedded in the pointee, which is why you call it "an intrusive lock." Have I got that right?
IIUC, the thread-safety problem with reference-linked implementation isn't that so much that it's hard to achieve -- anyone can use a shared mutex -- it's that it's hard to make a thread-safe implementation efficient. That is to say, you pay for the cost of locking and unlocking a mutex, and there's no way around it (**). Locking and unlocking mutexes is way more expensive than performing the lock-free operations used by boost::shared_ptr.
That's true. But it's been my experience that the majority of development don't have objects access via multiplethreads or don't run in a multithread environment. In this environment, you're paying additional price for using boost::shared_ptr reference-count logic, but not getting any benefits from it. With a policy base smart pointer, you can pick and choose what's best for paticular requirement, instead of being stuck with a one less than optimal method.
(**) Or so I thought: http://www.cs.chalmers.se/~dcs/ConcurrentDataStructures/phd_chap7.pdf seems to contradict that.
This can be done by using an intrusive lock.
On my test, reference-link is over 25% faster than reference-count logic for initialization. With BOOST_SP_USE_QUICK_ALLOCATOR defined,
than reference-link is over 30% faster than reference-count logic for initialization. I get the above results using VC++ 7.1, and if I
use the GNU 3.x compiler, than the difference is even greater in favor of reference-link logic. With the Borland compiler, the difference is about 22%
Are you claiming that using BOOST_SP_USE_QUICK_ALLOCATOR actually slows boost::shared_ptr down on all these compilers?
Of course not.... When you define BOOST_SP_USE_QUICK_ALLOCATOR, that does not just increase the performance of boost objects. It increases the performance for all object within the translation unit that has the define, and that is using allocators. So even though shared_ptr gets
an increase performance boost, the smart_ptr gets an even greater performance boost, which increases the performance ratio.
Oh, I had no idea you were using the allocator for your reference-linked smart pointers. I see no mention of that macro in your header. Where do you use it?
Please check out the test code. If you're testing code within the same translation unit, and you declare BOOST_SP_USE_QUICK_ALLOCATOR at the top of the translation unit, then it's going to effect all the code in that translation unit. I don't use BOOST_SP_USE_QUICK_ALLOCATOR in my smart_ptr, any more than boost::shared_ptr uses it in it's header. Not only would it be more difficult for me to compile the code so as to only use BOOST_SP_USE_QUICK_ALLOCATOR for boost::shared_ptr, but it also wouldn't make much sence just to give boost::shared_ptr that advantage, and not do so for the other smart pointers when trying to make a level comparison test.

"David Maisonave" <dmaisonave@commvault.com> writes:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:<ufyn5wggo.fsf@boost-consulting.com>...
"David Maisonave" <dmaisonave@commvault.com> writes:
<snip loads of quoted text> Please don't overquote.
IIUC, the thread-safety problem with reference-linked implementation isn't that so much that it's hard to achieve -- anyone can use a shared mutex -- it's that it's hard to make a thread-safe implementation efficient. That is to say, you pay for the cost of locking and unlocking a mutex, and there's no way around it (**). Locking and unlocking mutexes is way more expensive than performing the lock-free operations used by boost::shared_ptr.
That's true. But it's been my experience that the majority of development don't have objects access via multiplethreads or don't run in a multithread environment. In this environment, you're paying additional price for using boost::shared_ptr reference-count logic, but not getting any benefits from it.
Not if you're compiling without mt support on; the thread safety features of shared_ptr just compile away (and I think there's a macro you can use to force them off). Let's compare apples to apples: your reflinked implementation will be slower in a MT environment. I'm not yet convinced it will be faster in a ST environment unless you leave the MT features of shared_ptr turned on.
With a policy base smart pointer, you can pick and choose what's best for paticular requirement, instead of being stuck with a one less than optimal method.
Yeah, yeah, old story. I think almost everyone believes that there will be times when its necessary to have a special-purpose optimized smart pointer. But now you're mixing up the issues. We were talking about the efficiency of your implementation w.r.t. threading. You can't sidestep that issue by saying "it's more flexible."
Are you claiming that using BOOST_SP_USE_QUICK_ALLOCATOR actually slows boost::shared_ptr down on all these compilers?
Of course not.... When you define BOOST_SP_USE_QUICK_ALLOCATOR, that does not just increase the performance of boost objects. It increases the performance for all object within the translation unit that has the define, and that is using allocators. So even though shared_ptr gets
an increase performance boost, the smart_ptr gets an even greater performance boost, which increases the performance ratio.
Oh, I had no idea you were using the allocator for your reference-linked smart pointers. I see no mention of that macro in your header. Where do you use it?
Please check out the test code.
I don't have time to grok your code right now. Can't you just answer my question?
If you're testing code within the same translation unit, and you declare BOOST_SP_USE_QUICK_ALLOCATOR at the top of the translation unit, then it's going to effect all the code in that translation unit.
Since when?
I don't use BOOST_SP_USE_QUICK_ALLOCATOR in my smart_ptr, any more than boost::shared_ptr uses it in it's header.
shared_ptr does use BOOST_SP_USE_QUICK_ALLOCATOR in its header, by including boost/detail/sp_counted_impl.hpp, which contains these lines: #if defined(BOOST_SP_USE_QUICK_ALLOCATOR) #include <boost/detail/quick_allocator.hpp> #endif . . . #if defined(BOOST_SP_USE_QUICK_ALLOCATOR) void * operator new( std::size_t ) { return quick_allocator<this_type>::alloc(); } void operator delete( void * p ) { quick_allocator<this_type>::dealloc( p ); } #endif the quick allocator doesn't just get used by virtue of being #included. You have to overload operator new if you want new and delete to use it for a particular type.
Not only would it be more difficult for me to compile the code so as to only use BOOST_SP_USE_QUICK_ALLOCATOR for boost::shared_ptr, but it also wouldn't make much sence just to give boost::shared_ptr that advantage, and not do so for the other smart pointers when trying to make a level comparison test.
I'm not suggesting you do that. I'm telling you that when you define BOOST_SP_USE_QUICK_ALLOCATOR, the pointee of a shared_ptr is still allocated (in the normal case) using the builtin operator new. It sounds like you're claiming that in your tests, when BOOST_SP_USE_QUICK_ALLOCATOR is #defined, the pointees of your smart pointers are allocated using the quick allocator. If that's the case, and if you haven't also done something to cause the pointees of shared_ptr to use the quick allocator, you don't have a level test. -- Dave Abrahams Boost Consulting www.boost-consulting.com

That's true. But it's been my experience that the majority of development don't have objects access via multiplethreads or don't run in a multithread environment. In this environment, you're paying additional price for using boost::shared_ptr reference-count logic, but not getting any benefits from it. With a policy base smart pointer, you can pick and choose what's best for paticular requirement, instead of being stuck with a one less than optimal method.
Shared_ptr will only include threading support in a multithreaded environment. So you only pay for it if there is a possibility of shared_ptr's being shared across threads. The problem with letting the developer decide on a per ptr basis whether or not they need MT support is that users (myself included of course) tend to make poor choices. Also code changes, a design which didn't share ptr objects across threads may well end up sharing ptr objects as the code matures/is maintained. Even noticing bugs caused by not marshaling these objects is difficult, your test code may run fine until your code is run by a user with a SMP system, or with hyper threading enabled... who knows? For myself, I would want MT support built in by default. Its too easy to get wrong to disable it for performance reasons. And I don't buy that a smart pointer has to be the most efficient thing in the world, how many apps have smart pointer usage as their bottleneck? If they do wouldn't it better to work out why its the bottleneck rather than remove MT safety for a small speed gain on the smart pointer code? If I understand your implementation correctly making a ptr thread safe means doing the following : class A { public: A() : l(m, false) { } void lock() { l.lock(); } void unlock() { l.unlock(); } private: mutable boost::mutex m boost::scoped_lock l; }; void intrusive_ptr_lock(A * p) { p->lock(); }; void intrusive_ptr_unlock(A * p) { p->unlock(); } ... smart_ptr<A, copy_on_write_policy<intrusive_lock_policy> > ptr(new A); Thats quite a lot of work, though some of the problem with that is probably down to the way boost::mutexes are handled. I think my main problem with a PBSP is the last line of the above code fragment. Without template typedefs using them can be ugly as hell. I kind of feel that what you need is not some one tool to rule them all but several tools that are good at what they do. so : std::auto_ptr std::tr1::scoped_ptr std::tr1::shared_ptr boost::cow_ptr ? boost::copy_ptr ? Should be sufficient for most needs? and everyone one of them is a hell of a lot easier to type than : boost::smart_ptr<A, boost::copy_on_write_policy<boost::intrusive_lock_policy> > Sam

Shared_ptr will only include threading support in a multithreaded environment. So you only pay for it if there is a possibility of shared_ptr's being shared across threads.
The problem with letting the developer decide on a per ptr basis whether or not they need MT support is that users (myself included of course) tend to make poor choices. Also code changes, a design which didn't share ptr objects across threads may well end up sharing ptr objects as the code matures/is maintained.
I beg to differ. I don't make poor choices on a per-pointer basis. <heh> We typically use single-threaded policy-pointers in multi-threaded apps when we need the performance and know that the objects we're handling are either safely tucked away in that operating thread, or can be deep-copied across threads using some external synchronization. If you have a policy-based smart pointer and are concerned that developers should not have equal access to ST and MT versions, consider making a default version that is MT, and leaving it at that. As far as the code changing and shared objects being passed across threads, if you don't understand the ramifications of the design changes you are making, you get what you deserve. Just because you're passing shared, lockable objects acrosss threads doesn't mean you couldn't get into a deadlock situation, for example.
For myself, I would want MT support built in by default. Its too easy to get wrong to disable it for performance reasons. And I don't buy that a smart pointer has to be the most efficient thing in the world, how many apps have smart pointer usage as their bottleneck?
Mine. Tested and found with Quantify, and although it was our own policy-based smart pointer, the mutex locking was the high peak on the graph. I halved my app's response latency by using a single-threaded smart pointer in a multi-threaded app. I'm not weighing in for support of the proposed code, I'm simply stating that I think having access to a shared ST pointer is a good idea in an MT environment. And personally I'd prefer a declaration of the policies required instead of a macro to change the behavior of specific class. - Bud

If you have a policy-based smart pointer and are concerned that developers should not have equal access to ST and MT versions, consider making a default version that is MT, and leaving it at that.
As long as the default is MT, then yes. But I'm still not in favour of PBSP's anyway, but I'll come back to that...
As far as the code changing and shared objects being passed across threads, if you don't understand the ramifications of the design changes you are making, you get what you deserve. Just because you're passing shared, lockable objects acrosss threads doesn't mean you couldn't get into a deadlock situation, for example.
Yes of course, in fact I'd take it further than that. "because you're passing shared, lockable objects across threads" means you probably WILL get into deadlock, at some point.... its tricky stuff to get right.
For myself, I would want MT support built in by default. Its too easy to get wrong to disable it for performance reasons. And I don't buy that a smart pointer has to be the most efficient thing in the world, how many apps have smart pointer usage as their bottleneck?
Mine. Tested and found with Quantify, and although it was our own policy-based smart pointer, the mutex locking was the high peak on the graph. I halved my app's response latency by using a single-threaded smart pointer in a multi-threaded app.
Which of course is precisely the right time to start making oprimisations - after you've profiled it and found it to be a problem. Still I'm curious about this though, why was your code copying so many smart pointers? We've had similar things crop up before, but its always been a poor choice of algorithm/container/strategy that was causing all the unnecessary copying, Fixing the problem rather the symptom has proved more effective for us. Also I wonder if you had tried the shared_ptr lock free locking (huh?!?) you'd have noticed similar (but obviously not *as* good) perfomance gains?
I'm not weighing in for support of the proposed code, I'm simply stating that I think having access to a shared ST pointer is a good idea in an MT environment. And personally I'd prefer a declaration of the policies required instead of a macro to change the behavior of specific class.
In general I agree - I'm dead against macros to enable/disable features. But I also find using complex policy classes equally annoying. Default template parameters get you so far, but have problems of their own. I'd prefer a really explicit separate class called shared_ptr_no_threads<> or something. They have very different behaviour, they should be called different things. Anyway this debate has been through the mangles several times before, though this poll on the subject was quite some time ago : http://aspn.activestate.com/ASPN/Mail/Message/boost/1189836 And in general it seems like people were in favour of a further development of a PBSP, provided the interface can be got right. I wonder what the consensus is now? Sam

Mine. Tested and found with Quantify, and although it was our own policy-based smart pointer, the mutex locking was the high peak on the graph. I halved my app's response latency by using a single-threaded smart pointer in a multi-threaded app.
Which of course is precisely the right time to start making oprimisations - after you've profiled it and found it to be a problem. Still I'm curious about this though, why was your code copying so many smart pointers?
This was a policy-based smart pointer that performed method-level locking before going into the pointee. No copying going on. Not a boost smart pointer. In fact, I'm definitely not qualified to comment on the boost smart pointers per-se, since I've never used them, but I wanted to put in my two cents worth in about a single smart pointer class that uses orthogonal policies. That's what we're using, with policies for copying (ctor/clone), method locking, object locking (intrusive/non-intrusive) and ownership (MT/ST, intrusive/non-intrusive).
Also I wonder if you had tried the shared_ptr lock free locking (huh?!?) you'd have noticed similar (but obviously not *as* good) perfomance gains?
Not using the boost smart pointers.
In general I agree - I'm dead against macros to enable/disable features. But I also find using complex policy classes equally annoying. Default template parameters get you so far, but have problems of their own.
I'd prefer a really explicit separate class called shared_ptr_no_threads<> or something. They have very different behaviour, they should be called different things.
Yeah, they have different behavior, but that's the point, really. You have one code base that invokes policies, without really knowing what the policies are specifically implementing. Here's a link to the one we're using: http://www.weirdsolutions.com/developersCentral/tools/ace_additions.tar.bz2 - Bud
participants (4)
-
Bud Millwood
-
David Abrahams
-
David Maisonave
-
Sam Partington