
It is likely that we will simply just disagree on this, but I will at make a stab at responding to your points.
For some specific smart pointer applications, it is possible to design a smart pointer that is more optimal than shared_ptr. That's what a policy-based smart pointer is all about, and hopefully we will soon have one in Boost. In the meantime, look at the Loki policy-based smart pointer.
I'm familiar with Alexandresu's book (even if I can't always remember how to spell his name<g>), but...
But sometimes a single general-purpose smart pointer is needed.
Yes! Although I am focused purely on reference counting smart pointers. But...
For example, as a standard type to communicate between libraries, including third-party libraries. Or to recommend generally to a wide-range of programmers, including those who might be confused by a plethora of choices.
Exactly.
Or in cases where various efficiency concerns are nowhere near as important as using a well-known, well-understood, and soon standardized smart pointer.
Why can't we have both? I would suggest that a major reason STL is so popular is that it paid careful attention to efficiency from its inception. If std::vector had allocated several or nearly a dozen extra machine words per object as shared_ptr does at times, I believe things would have turned out differently. The std containers offer what is needed to accomplish a well focused task, allocating only the data necessary for that job and no more. For example std::list need only allocate 2 pointers per object. If Stepanov had gone on to add feature after feature that made this number on par with shared_ptr, again, it likely would not have been well received.
The Colvin-Dimov shared_ptr design solves a far-broader range of smart pointer problems than any other proposed design. That's why it is the most commonly recommenced C++ smart pointer type in the world. That's why the C++ committee is standardizing it.
Please don't take my doubts the wrong way; I am not trying to "dis" their fine work. If folks want to use it, that's fine. They should be warned, but caveat emptor. I am focused on the last statement above. As it sits, shared_ptr simply goes too far and in so doing one must always ask "is shared_ptr suitable in this case?" As a library author, this is often an impossible question to answer, because it could depend on the use to which the library is put. In other words, by adding all the frills, shared_ptr actually reduces its scope of applicability.
It isn't intended to replace all other smart pointers. No single smart pointer type can do that. So appreciate it for the many problems it solves, rather than worrying about the problems where another smart point would be more optimal.
Where is the standard smart pointer to which you refer? I can keep writing my own (have been for many years, and kinda like it<g>), but what I would really love to see is a pair of simple ones provided by the standard library that accomplish just two jobs: delete it; and delete it when the count drops to 0 (with some means for the count to be internal to the object). If more smart pointers are added beyond that, that would be great. So far, both auto_ptr and soon shared_ptr fail at the K.I.S.S. test. They set out to solve these two "simple" problems and got lost along the way. A standard facility should not offer such surprises to unsuspecting users. With auto_ptr we got "surprise! you thought I was the owner, but it was that other guy that just left scope". With shared_ptr we will have "surprise! here's that bad_alloc you've always wanted". If the current design is the way it will be (probably), can't we at least have a warning label: "WARNING: shared_ptr may allocate 40+ bytes per object (fine print: depending on platform and multithread considerations, your mileage may vary, etc.)" Well, I'm no lawyer I guess... <g>
All IMO, of course!
Likewise! Best, Don ===== __________________________________ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250

Don G wrote: [...]
The std containers offer what is needed to accomplish a well focused task, allocating only the data necessary for that job and no more. For example std::list need only allocate 2 pointers per object. If Stepanov had gone on to add feature after feature that made this number on par with shared_ptr, again, it likely would not have been well received.
... and ...
I am focused on the last statement above. As it sits, shared_ptr simply goes too far and in so doing one must always ask "is shared_ptr suitable in this case?" As a library author, this is often an impossible question to answer, because it could depend on the use to which the library is put. In other words, by adding all the frills, shared_ptr actually reduces its scope of applicability.
... and ...
"WARNING: shared_ptr may allocate 40+ bytes per object (fine print: depending on platform and multithread considerations, your mileage may vary, etc.)"
shared_ptr only contains the features that are absolutely essential for the answer of "is shared_ptr suitable" to be "yes" in the vast majority of cases (and to stay "yes" as the design evolves). A not so small number of extra features that do not contribute significantly to its expressive power have been rejected. I suspect that you haven't had much experience with the design possibilities opened by weak pointers or custom deleters, which might be why you don't want to pay 40+ bytes for them. For me, they are easily worth double this cost. That said, Alexander Terekhov has discovered a lock-free algorithm that implements the reference counting required by shared_ptr, so this brings down the design-imposed overhead to four words per object for an empty deleter. You can find a prototype implementation at http://www.pdimov.com/cpp/shared_count_x86_exp2.hpp I will try to portabilify this implementation and incorporate it in the next Boost release. You are absolutely correct that sometimes you can get away with a simpler smart pointer that carries less overhead. The emphasis in shared_ptr's case is on correctness and expressive power over efficiency, because (IMO) the first two provide more value in a standard smart pointer, since they are much more harder to reinvent. The good thing is that once you have a stable design, efficiency usually follows, as happened here.

Hi Peter, Sorry it took so long for me to respond (real life just keeps getting in the way), and thanks for taking the time to reply.
shared_ptr only contains the features that are absolutely essential for the answer of "is shared_ptr suitable" to be "yes" in the vast majority of cases (and to stay "yes" as the design evolves).
I guess this is the crux of the my contention. I think all must agree there must be: automatic counting; basic pointer syntax (including any implicit conversion from derived to base, and various operators). The features beyond those are of the debatable variety.
A not so small number of extra features that do not contribute significantly to its expressive power have been rejected.
I can only imagine! :)
I suspect that you haven't had much experience with the design possibilities opened by weak pointers or custom deleters ...
I have not; shared_ptr et.al. is new to me. My own work never followed those paths for two reasons. The custom deleter never occured to me because I got the same effect with intrusion. The weak_ptr role was filled by T* or T&, and the standard refrain of "be careful what you are doing if you go this way." Of course, in most cases one can do some refactoring and remove the need for the weak reference. ...
which might be why you don't want to pay 40+ bytes for them. For me, they are easily worth double this cost.
I would agree _if_ the costs were incurred when and where I used those features. My issue is the relatively high cost of shared_ptr even when I don't use its advanced features. Primarily, this cost comes from the deleter, but weak_ptr and its mutex are just about as bad if not worse in multi-threaded applications.
That said, Alexander Terekhov has discovered a lock-free algorithm that implements the reference counting required by shared_ptr, ...
That is good news, and yes, he is more clever than I. The solution is most welcome, because that reduces the cost of weak_ptr to a single extra word. Sadly, it must be present even when weak_ptr is never used, but the cost is really low. Much better than having a mutex, even of the lightweight variety. Well, at least for platforms that support CAS or similar lock-free primitives needed to write such an algorithm.
... so this brings down the design-imposed overhead to four words per object for an empty deleter.
From 1.32 code, shared_ptr never has an empty deleter unless it is itself empty (or NULL). Unless I missed something? If my analysis is correct, an overhead of 4 words is required for the deleter alone:
- shared_count must contain a pointer to the sp_counted_base - sp_counted_base has a vtable - sp_counted_base has a pointer - sp_counted_base has a functor The weak_ptr adds an extra count (1 word) but more importantly requires a mutex, at least on platforms lacking more advanced lock-free primitives such as InterlockedCompareExchange. So, where lock-free weak_ptr can be achieved, the overhead (beyond the minimum of a single word for the reference count) is 5 words, plus any heap allocation overhead. Where it cannot, we are still at 5 words plus sizeof(lightweight_mutex). If it were possible to only have a custom deleter if the user requested one, that would indeed drop the cost to 2 extra words. This would require that the counts be moved from sp_counted_base into shared_count and shared_ptr to contain a shared_count*. While I have a harder time seeing the value here vs. weak_ptr, at least its only 1 word/object "wasted". I suspect this approach would reduce the safety guarantees you were after with respect to shared_ptr<void> and/or the lack of virtual dtors. To me, the gratuitous virtual dtor is undesirable, but shared_ptr<void> has some appeal. Perhaps shared_ptr<void> could be specialized to be the only form that always creates the deleter (if not already present)? Unfortunately, a round-trip through shared_ptr<void> would likely keep the deleter, but at least the cost would only be there when shared_ptr<void> were used.
You are absolutely correct that sometimes you can get away with a simpler smart pointer that carries less overhead. The emphasis in shared_ptr's case is on correctness and expressive power over efficiency, because (IMO) the first two provide more value in a standard smart pointer, since they are much more harder to reinvent.
I think "sometimes" is a bit of an understatement: you can always "get away with" a simpler smart pointer; there are just tradeoffs to be made. In many cases, a simple smart pointer can lead to a better design (IMO). Frequently when I wanted to avoid a weak reference, the refactoring was an overall improvement. This wouldn't always be the case, of course. As always, a balance point must be found. A garbage collector is safer and probably more expressive. Most C++ folks don't like the associated cost of that solution. My fear is that an overly complex design, with the overhead it entails, will put shared_ptr in that boat for more applications than you might think.
The good thing is that once you have a stable design, efficiency usually follows, as happened here.
True enough, but optimizations leveraging advanced, platform-specific capabilities do not offer much benefit to those writing portable code. For them, the cost remains the same. In this case, this will always be so. Now we've come back to my original point: the very design of shared_ptr requires a certain overhead that is _irreducible_ in the general case. And that overhead is: A. Not clearly specified or articulated (for those who need to know) B. _Much_ more than one would naively expect Best regards, Don ===== __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250

Don G wrote:
Hi Peter,
Sorry it took so long for me to respond (real life just keeps getting in the way), and thanks for taking the time to reply.
shared_ptr only contains the features that are absolutely essential for the answer of "is shared_ptr suitable" to be "yes" in the vast majority of cases (and to stay "yes" as the design evolves).
I guess this is the crux of the my contention. I think all must agree there must be: automatic counting; basic pointer syntax (including any implicit conversion from derived to base, and various operators). The features beyond those are of the debatable variety.
There are two features beyond what you listed: 1. "Do the right thing" destruction; 2. Weak pointer support. Note that (1) does not equal "custom deleter support". Custom deleters do not add any further overhead (when not used) over "do the right thing" destruction. By that I mean that shared_ptr<X> can always be destroyed when it has been constructed properly, no matter whether at the point of destruction X is a complete type with an accessible (and virtual, if the actual object is not of type X) destructor, or whether 'operator delete' at the point of destruction can deallocate from the 'operator new' heap at the point of construction.
... so this brings down the design-imposed overhead to four words per object for an empty deleter.
From 1.32 code, shared_ptr never has an empty deleter unless it is itself empty (or NULL). Unless I missed something? If my analysis is correct, an overhead of 4 words is required for the deleter alone:
- shared_count must contain a pointer to the sp_counted_base - sp_counted_base has a vtable - sp_counted_base has a pointer - sp_counted_base has a functor
The weak_ptr adds an extra count (1 word) but more importantly requires a mutex, at least on platforms lacking more advanced lock-free primitives such as InterlockedCompareExchange.
The theoretical minimum cost of a non-intrusive pointer is: - per-instance pointer to the count; - one word for the count. Feature (1) above adds: - vtable - pointer The functor need not be present when not used, although the current code base does not do that. It can also be optimized out when it's an empty class with the help of boost::compressed_pair. Again, the current code base doesn't do it, but we're talking about the design-imposed overhead here. Feature (2) adds - an extra count. For platforms lacking CAS (are there any?) Tyson Whitehead has posted an implementation: http://lists.boost.org/MailArchives/boost/msg66868.php that locks a mutex only when weak_ptr::lock is used. If weak pointers aren't used, the implementation has essentially the same performance as an ordinary non-intrusive smart pointer. The mutex pool trick can be used to not include a per-count mutex.
I suspect that you haven't had much experience with the design possibilities opened by weak pointers or custom deleters ...
I have not; shared_ptr et.al. is new to me. My own work never followed those paths for two reasons. The custom deleter never occured to me because I got the same effect with intrusion.
One situation where custom deleters come in handy is when you want to use your own smart pointer in an application, but a third-party library takes a shared_ptr. With a shared_ptr not supporting deleters, you'd be forced to use shared_ptr in your application, too.
The weak_ptr role was filled by T* or T&, and the standard refrain of "be careful what you are doing if you go this way." Of course, in most cases one can do some refactoring and remove the need for the weak reference.
You can always switch to active notification and avoid the need for a passive observer such as weak_ptr, but this often makes your design worse, because now the observed objects need to know about the observers in order to notify them. There's also the 'shared from this' problem. [...]
The good thing is that once you have a stable design, efficiency usually follows, as happened here.
True enough, but optimizations leveraging advanced, platform-specific capabilities do not offer much benefit to those writing portable code. For them, the cost remains the same. In this case, this will always be so.
Portable code? shared_ptr doesn't need to be portable, if it comes with your standard library.
Now we've come back to my original point: the very design of shared_ptr requires a certain overhead that is _irreducible_ in the general case. And that overhead is:
A. Not clearly specified or articulated (for those who need to know) B. _Much_ more than one would naively expect
All designs require a certain irreducible overhead, do they not? Think of it as a std::map. The overhead of a map is not clearly specified or articulated. (B) probably applies as well. But it works.

Hi Peter, Again, thanks for taking all the time to patiently reply. This will in all likelihood be my final post on this topic, though I will read thoroughly any replies from you or others.
I guess this is the crux of the my contention. I think all must agree there must be: automatic counting; basic pointer syntax (including any implicit conversion from derived to base, and various operators). The features beyond those are of the debatable variety.
There are two features beyond what you listed:
1. "Do the right thing" destruction; 2. Weak pointer support.
I meant that I view these as debatable, not "absolutely essential".
Note that (1) does not equal "custom deleter support". Custom deleters do not add any further overhead (when not used) over "do the right thing" destruction.
By that I mean that shared_ptr<X> can always be destroyed when it has been constructed properly, no matter whether at the point of destruction X is a complete type with an accessible (and virtual, if the actual object is not of type X) destructor
But if I have been using T* all these years, "do the right thing" is really just adding overhead for no benefit to me. I have to rethink all my options in light of this feature, because it is not how C++ pointers behave. Some may argue "better"; others will argue "bloated".
or whether 'operator delete' at the point of destruction can deallocate from the 'operator new' heap at the point of construction.
Quite an esoteric condition. I cannot imagine that shared_ptr would fit such an ABI even given this ability. In my experience, when this kind of isolation is in play, self destruction (via vtable as in COM) or similar techniques are also used.
The theoretical minimum cost of a non-intrusive pointer is:
- per-instance pointer to the count; - one word for the count.
I agree, but would say it this way: non-intrusion costs sizeof pointer for the pointer to the count. The count itself doesn't seem fair to even call overhead<g> - I mean, you asked for this part by entering the world of reference counting! :)
Feature (1) above adds:
- vtable - pointer
The functor need not be present when not used, although the current code base does not do that. It can also be optimized out when it's an empty class with the help of boost::compressed_pair. Again, the current code base doesn't do it, but we're talking about the design-imposed overhead here.
I will look at compressed_pair, but I don't see how the functor could not exist or take no space. Perhaps I am too focused on the details of the current implementation.
Feature (2) adds
- an extra count.
For platforms lacking CAS (are there any?) Tyson Whitehead has posted an implementation:
http://lists.boost.org/MailArchives/boost/msg66868.php
that locks a mutex only when weak_ptr::lock is used. If weak pointers aren't used, the implementation has essentially the same performance as an ordinary non-intrusive smart pointer. The mutex pool trick can be used to not include a per-count mutex.
I think most platforms would have atomic inc/dec, but the more exotic compareExchange (or equivalently powerful primitive) is what I was concerned with being less available.
One situation where custom deleters come in handy is when you want to use your own smart pointer in an application, but a third-party library takes a shared_ptr. With a shared_ptr not supporting deleters, you'd be forced to use shared_ptr in your application, too.
Welcome to the happy world of 3rd party libraries :) Seriously, it is very common to encounter this type of thing, and it again comes back to the problem that custom deleters add most (all?) of their weight even if you don't use them. If it were otherwise, I would argue for their inclusion. I can imagine cases where they would be useful.
You can always switch to active notification and avoid the need for a passive observer such as weak_ptr, but this often makes your design worse, because now the observed objects need to know about the observers in order to notify them.
I agree that active notification tends to be too complicated and, therefore, not desirable. What I meant was that eliminating all need to observe is better than either. In many (not all) cases, if A and B are cyclic such that A owns B, but B needs access to its owner, one can refactor things such that A & B jointly own a C. The stuff B needed from A moves to C and no more cycle. I had a major redesign recently where I did exactly this and things got much better as a result. Mileage will vary, of course.
There's also the 'shared from this' problem.
Not a good thing in a constructor - been there; wish I hadn't done that! :)
The good thing is that once you have a stable design, efficiency usually follows, as happened here.
True enough, but optimizations leveraging advanced, platform- specific capabilities do not offer much benefit to those writing portable code. For them, the cost remains the same. In this case, this will always be so.
Portable code? shared_ptr doesn't need to be portable, if it comes with your standard library.
By "portable code" I mean code that uses shared_ptr on multiple OSes or compilers. The cost may be acceptable on some, but not all of the compilers in use.
Now we've come back to my original point: the very design of shared_ptr requires a certain overhead that is _irreducible_ in the general case. And that overhead is:
A. Not clearly specified or articulated (for those who need to know) B. _Much_ more than one would naively expect
All designs require a certain irreducible overhead, do they not? Think of it as a std::map. The overhead of a map is not clearly specified or articulated. (B) probably applies as well. But it works.
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm. If an implementation does worse than this, it is the fault of the implementation. The design of std::map allows for a realization that has that level of overhead, but no less. What made me focus so on shared_ptr is that the number of instances I would expect to see is quite large: all dynamically allocated objects in the ideal case. I could probably enumerate the number of places in our 1M+ lines of code where we used std::map. Not that you have to appease me <g>, but since its cost aren't additive, I would love to at least have some way to "opt out" of the majority of shared_ptr overhead. For example, a way to indicate that shared_ptr<T> should use an intrusive count, disallow weak_ptr and put the kibosh on all the other magic. Why not use "intrusive_ptr"? It's not the same name (and hence templates using shared_ptr would break). As it sits, all optimization opportunity is in the hands of the RTL implementor. As a user I will have to either accept all the overhead (in the worst case on all _my_ target platforms), or go elsewhere. Since the outcome of all this discussion will likely have no effect on the design of shared_ptr (not that I expected it would nor do I mean to slight anyone), my final plea is that the TR1 implementation include any and all optimizations so as to increase the likelyhood that other implementations will follow (like the "return object by value" optimization). Leaving these as "exercises" would be a great disservice since I expect that most implementors would only slowly rediscover all the optimizations that you and others have already imagined. I don't know that language could be put in place to either encourage or require some minimum set of optimizations in a conforming implementation (like "std::map::find is O(lg(n))" or such). Best regards, Don ===== __________________________________ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250

Don G <dongryphon@yahoo.com> writes:
Hi Peter,
Again, thanks for taking all the time to patiently reply. This will in all likelihood be my final post on this topic, though I will read thoroughly any replies from you or others.
I guess this is the crux of the my contention. I think all must agree there must be: automatic counting; basic pointer syntax (including any implicit conversion from derived to base, and various operators). The features beyond those are of the debatable variety.
There are two features beyond what you listed:
1. "Do the right thing" destruction; 2. Weak pointer support.
I meant that I view these as debatable, not "absolutely essential".
Note that (1) does not equal "custom deleter support". Custom deleters do not add any further overhead (when not used) over "do the right thing" destruction.
By that I mean that shared_ptr<X> can always be destroyed when it has been constructed properly, no matter whether at the point of destruction X is a complete type with an accessible (and virtual, if the actual object is not of type X) destructor
But if I have been using T* all these years, "do the right thing" is really just adding overhead for no benefit to me. I have to rethink all my options in light of this feature, because it is not how C++ pointers behave. Some may argue "better"; others will argue "bloated".
I know Bjarne is very concerned about what happens to software when programmers get the smart pointer religion. He claims to have seen programs that got huge and slow when people stopped using raw pointers.
or whether 'operator delete' at the point of destruction can deallocate from the 'operator new' heap at the point of construction.
Quite an esoteric condition.
Not at all; it's a very common concern for Windows programmers.
I cannot imagine that shared_ptr would fit such an ABI even given this ability. In my experience, when this kind of isolation is in play, self destruction (via vtable as in COM) or similar techniques are also used.
The theoretical minimum cost of a non-intrusive pointer is:
- per-instance pointer to the count; - one word for the count.
I agree, but would say it this way: non-intrusion costs sizeof pointer for the pointer to the count. The count itself doesn't seem fair to even call overhead<g> - I mean, you asked for this part by entering the world of reference counting! :)
Feature (1) above adds:
- vtable - pointer
The functor need not be present when not used, although the current code base does not do that. It can also be optimized out when it's an empty class with the help of boost::compressed_pair. Again, the current code base doesn't do it, but we're talking about the design-imposed overhead here.
I will look at compressed_pair, but I don't see how the functor could not exist or take no space.
It's known as the "Empty Base Optimization" (EBO).
One situation where custom deleters come in handy is when you want to use your own smart pointer in an application, but a third-party library takes a shared_ptr. With a shared_ptr not supporting deleters, you'd be forced to use shared_ptr in your application, too.
Welcome to the happy world of 3rd party libraries :) Seriously, it is very common to encounter this type of thing, and it again comes back to the problem that custom deleters add most (all?) of their weight even if you don't use them.
What weight? They only really need to cost a single function pointer.
All designs require a certain irreducible overhead, do they not? Think of it as a std::map. The overhead of a map is not clearly specified or articulated. (B) probably applies as well. But it works.
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm.
You don't need any storage for rebalancing. Typical map nodes have 3 pointers: parent, left child, and right child. -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Feb 15, 2005, at 9:49 AM, David Abrahams wrote:
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm.
You don't need any storage for rebalancing. Typical map nodes have 3 pointers: parent, left child, and right child.
If it is a red-black tree, you need to store the color of each node. This information requires 1 bit, which can easily be padded to a word. If it is a AVL tree, each node needs to store its tilt. This information requires 2 bits. Or is there a new technique I'm about to learn about? :-) -Howard

Howard Hinnant <hinnant@twcny.rr.com> writes:
On Feb 15, 2005, at 9:49 AM, David Abrahams wrote:
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm.
You don't need any storage for rebalancing. Typical map nodes have 3 pointers: parent, left child, and right child.
If it is a red-black tree, you need to store the color of each node. This information requires 1 bit, which can easily be padded to a word.
Oh, I didn't understand that he meant color info. I guess that is only for rebalancing, isn't it? /me crawls back under rock
If it is a AVL tree, each node needs to store its tilt. This information requires 2 bits.
Or is there a new technique I'm about to learn about? :-)
/me burrows several inches down into dirt under rock -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Howard Hinnant <hinnant@twcny.rr.com> writes:
On Feb 15, 2005, at 9:49 AM, David Abrahams wrote:
You don't need any storage for rebalancing. Typical map nodes have 3 pointers: parent, left child, and right child.
If it is a red-black tree, you need to store the color of each node. This information requires 1 bit, which can easily be padded to a word.
Oh, I didn't understand that he meant color info. I guess that is only for rebalancing, isn't it?
/me crawls back under rock
If it is a AVL tree, each node needs to store its tilt. This information requires 2 bits.
Or is there a new technique I'm about to learn about? :-)
/me burrows several inches down into dirt under rock
You could have pretended that you had in mind an implementation where pointers are always even and one of the least-significant bits is used as a color bit. ;-) Are there std::maps that store a next and previous pointer to optimize traversal at the expense of per-node overhead? (Of course my original point was that most people don't care about the per-node overhead and those that do just use a different data structure.)

On Feb 15, 2005, at 2:41 PM, Peter Dimov wrote:
You could have pretended that you had in mind an implementation where pointers are always even and one of the least-significant bits is used as a color bit. ;-)
Actually the Metrowerks implementation does this, although it turns that optimization off if: 1. The client requests it. 2. Alignment is 1. I just had a hard time dealing with all of those bits wasted. :-)
Are there std::maps that store a next and previous pointer to optimize traversal at the expense of per-node overhead?
I haven't heard of one. Along those lines I did do an analysis of the cost of traversal though. If you have a perfectly balanced tree of 2^N - 1 elements (a full, balanced tree), and if you iterate from begin() to end(), then the number of pointer links you chase is 2*distance(begin(),end()). I.e. for my money your speed gain would be no greater than a factor of two, and due to the fact that your nodes are now up to 50% bigger, you could well realize a slow down just from the extra level-2 cache flush. But I haven't actually coded it up and performance tested it. If optimized iteration is really what you need, std::map probably isn't the container you want. A sorted vector, deque or list (or cdeque or slist) may be a better fit. map is good for the requirements of a strictly bounded log(N) lookup and insert/erase, plus iterator/reference stability. Iteration and per-element overhead are secondary concerns and therefore sacrificed. -Howard

"Peter Dimov" <pdimov@mmltd.net> writes:
You could have pretended that you had in mind an implementation where pointers are always even and one of the least-significant bits is used as a color bit. ;-)
I know that trick, but I have at least a little bit of honor left ;-). -- Dave Abrahams Boost Consulting www.boost-consulting.com

At 11:11 AM 2/15/2005, Howard Hinnant wrote:
On Feb 15, 2005, at 9:49 AM, David Abrahams wrote:
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm.
You don't need any storage for rebalancing. Typical map nodes have 3 pointers: parent, left child, and right child.
If it is a red-black tree, you need to store the color of each node. This information requires 1 bit, which can easily be padded to a word. If it is a AVL tree, each node needs to store its tilt. This information requires 2 bits.
Or is there a new technique I'm about to learn about? :-)
I don't know of any 100% portable technique, but on many platforms pointers to heap allocated memory within a node never have one or more of the low-order bits on. This opens those bits up to be used as color or similar flags, at the cost of having to "and" them off before dereferencing the pointer. If the pointer is wrapped in a class, and can only be accessed by member functions, then there is no chance of inadvertently forgetting the "and", and an assert can be done on construction to verify the low-order bits involved are actually zero. A bit ugly and I'd hate to see it used in portable Boost code, but if you are writing library code for a specific platform, why not? --Beman

Beman Dawes <bdawes@acm.org> writes:
At 11:11 AM 2/15/2005, Howard Hinnant wrote:
On Feb 15, 2005, at 9:49 AM, David Abrahams wrote:
Absolutely, all designs have certain irreducible overhead - I didn't mean to imply otherwise. In the case of std::map, one should expect an instance to be on the large-ish side. Also, one should expect the per node overhead to be two pointers and probably another word for the tree balancing algorithm.
You don't need any storage for rebalancing. Typical map nodes have 3 pointers: parent, left child, and right child.
If it is a red-black tree, you need to store the color of each node. This information requires 1 bit, which can easily be padded to a word. If it is a AVL tree, each node needs to store its tilt. This information requires 2 bits.
Or is there a new technique I'm about to learn about? :-)
I don't know of any 100% portable technique, but on many platforms pointers to heap allocated memory within a node never have one or more of the low-order bits on. This opens those bits up to be used as color or similar flags, at the cost of having to "and" them off before dereferencing the pointer. If the pointer is wrapped in a class, and can only be accessed by member functions, then there is no chance of inadvertently forgetting the "and", and an assert can be done on construction to verify the low-order bits involved are actually zero. A bit ugly and I'd hate to see it used in portable Boost code, but if you are writing library code for a specific platform, why not?
It's fine to use it conditionally in portable code, providing you have a (non-portable, obviously) way to find out if the low bit of the pointer is needed. -- Dave Abrahams Boost Consulting www.boost-consulting.com

"Peter Dimov" <pdimov@mmltd.net> writes:
The weak_ptr adds an extra count (1 word) but more importantly requires a mutex, at least on platforms lacking more advanced lock-free primitives such as InterlockedCompareExchange.
The theoretical minimum cost of a non-intrusive pointer is:
- per-instance pointer to the count; - one word for the count.
Feature (1) above adds:
- vtable - pointer
The functor need not be present when not used, although the current code base does not do that. It can also be optimized out when it's an empty class with the help of boost::compressed_pair. Again, the current code base doesn't do it, but we're talking about the design-imposed overhead here.
You could also replace the vtable with a function pointer per Boost.Function and eliminate some space overheads associated with dynamic type info. -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (5)
-
Beman Dawes
-
David Abrahams
-
Don G
-
Howard Hinnant
-
Peter Dimov