
Hi Peter, Again, thanks for taking all the time to patiently reply. This will in all likelihood be my final post on this topic, though I will read thoroughly any replies from you or others.
I guess this is the crux of the my contention. I think all must agree there must be: automatic counting; basic pointer syntax (including any implicit conversion from derived to base, and various operators). The features beyond those are of the debatable variety.
There are two features beyond what you listed:
1. "Do the right thing" destruction; 2. Weak pointer support.
I meant that I view these as debatable, not "absolutely essential".
Note that (1) does not equal "custom deleter support". Custom deleters do not add any further overhead (when not used) over "do the right thing" destruction.
By that I mean that shared_ptr<X> can always be destroyed when it has been constructed properly, no matter whether at the point of destruction X is a complete type with an accessible (and virtual, if the actual object is not of type X) destructor
But if I have been using T* all these years, "do the right thing" is really just adding overhead for no benefit to me. I have to rethink all my options in light of this feature, because it is not how C++ pointers behave. Some may argue "better"; others will argue "bloated".
or whether 'operator delete' at the point of destruction can deallocate from the 'operator new' heap at the point of construction.
Quite an esoteric condition. I cannot imagine that shared_ptr would fit such an ABI even given this ability. In my experience, when this kind of isolation is in play, self destruction (via vtable as in COM) or similar techniques are also used.
The theoretical minimum cost of a non-intrusive pointer is:
- per-instance pointer to the count; - one word for the count.
I agree, but would say it this way: non-intrusion costs sizeof pointer for the pointer to the count. The count itself doesn't seem fair to even call overhead<g> - I mean, you asked for this part by entering the world of reference counting! :)
Feature (1) above adds:
- vtable - pointer
The functor need not be present when not used, although the current code base does not do that. It can also be optimized out when it's an empty class with the help of boost::compressed_pair. Again, the current code base doesn't do it, but we're talking about the design-imposed overhead here.
I will look at compressed_pair, but I don't see how the functor could not exist or take no space. Perhaps I am too focused on the details of the current implementation.
Feature (2) adds
- an extra count.
For platforms lacking CAS (are there any?) Tyson Whitehead has posted an implementation:
http://lists.boost.org/MailArchives/boost/msg66868.php
that locks a mutex only when weak_ptr::lock is used. If weak pointers aren't used, the implementation has essentially the same performance as an ordinary non-intrusive smart pointer. The mutex pool trick can be used to not include a per-count mutex.
I think most platforms would have atomic inc/dec, but the more exotic compareExchange (or equivalently powerful primitive) is what I was concerned with being less available.
One situation where custom deleters come in handy is when you want to use your own smart pointer in an application, but a third-party library takes a shared_ptr. With a shared_ptr not supporting deleters, you'd be forced to use shared_ptr in your application, too.
Welcome to the happy world of 3rd party libraries :) Seriously, it is very common to encounter this type of thing, and it again comes back to the problem that custom deleters add most (all?) of their weight even if you don't use them. If it were otherwise, I would argue for their inclusion. I can imagine cases where they would be useful.
You can always switch to active notification and avoid the need for a passive observer such as weak_ptr, but this often makes your design worse, because now the observed objects need to know about the observers in order to notify them.
I agree that active notification tends to be too complicated and, therefore, not desirable. What I meant was that eliminating all need to observe is better than either. In many (not all) cases, if A and B are cyclic such that A owns B, but B needs access to its owner, one can refactor things such that A & B jointly own a C. The stuff B needed from A moves to C and no more cycle. I had a major redesign recently where I did exactly this and things got much better as a result. Mileage will vary, of course.
There's also the 'shared from this' problem.
Not a good thing in a constructor - been there; wish I hadn't done that! :)
The good thing is that once you have a stable design, efficiency usually follows, as happened here.
True enough, but optimizations leveraging advanced, platform- specific capabilities do not offer much benefit to those writing portable code. For them, the cost remains the same. In this case, this will always be so.
Portable code? shared_ptr doesn't need to be portable, if it comes with your standard library.
By "portable code" I mean code that uses shared_ptr on multiple OSes or compilers. The cost may be acceptable on some, but not all of the compilers in use.
Now we've come back to my original point: the very design of shared_ptr requires a certain overhead that is _irreducible_ in the general case. And that overhead is:
A. Not clearly specified or articulated (for those who need to know) B. _Much_ more than one would naively expect
All designs require a certain irreducible overhead, do they not? Think of it as a std::map. The overhead of a map is not clearly specified or articulated. (B) probably applies as well. But it works.
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm. If an implementation does worse than this, it is the fault of the implementation. The design of std::map allows for a realization that has that level of overhead, but no less. What made me focus so on shared_ptr is that the number of instances I would expect to see is quite large: all dynamically allocated objects in the ideal case. I could probably enumerate the number of places in our 1M+ lines of code where we used std::map. Not that you have to appease me <g>, but since its cost aren't additive, I would love to at least have some way to "opt out" of the majority of shared_ptr overhead. For example, a way to indicate that shared_ptr<T> should use an intrusive count, disallow weak_ptr and put the kibosh on all the other magic. Why not use "intrusive_ptr"? It's not the same name (and hence templates using shared_ptr would break). As it sits, all optimization opportunity is in the hands of the RTL implementor. As a user I will have to either accept all the overhead (in the worst case on all _my_ target platforms), or go elsewhere. Since the outcome of all this discussion will likely have no effect on the design of shared_ptr (not that I expected it would nor do I mean to slight anyone), my final plea is that the TR1 implementation include any and all optimizations so as to increase the likelyhood that other implementations will follow (like the "return object by value" optimization). Leaving these as "exercises" would be a great disservice since I expect that most implementors would only slowly rediscover all the optimizations that you and others have already imagined. I don't know that language could be put in place to either encourage or require some minimum set of optimizations in a conforming implementation (like "std::map::find is O(lg(n))" or such). Best regards, Don ===== __________________________________ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250