[Smart Ptr] make_shared slower than shared_ptr(new) on VC++9 (and 10) with fix

Hi all, Before I switch to using boost::make_shared<> I wanted to test its purported performance advantage. I created a simple benchmark for measuing raw allocation throughput for 3 classes of different sizes with a common base class (constructors and destructors trivial). The number of allocations was set to 40,000,000 as it was roughly giving me 10 seconds running time per test. Unfortunatelly it turns out that on VC++9 (release target with default optimizations) boost::make_shared is significantly slower than simply doing boost::shared_ptr(new). Here's the benchmark output: TestBoostMakeShared 10.577s 3.78179e+006 allocs/s TestBoostSharedPtrNew 8.907s 4.49085e+006 allocs/s As you can see boost::make_shared is over 15% slower than boost::shared_ptr(new) idiom. Having available VC++10 compiler as well I then compared these results with std::shared_ptr and std::make_shared implementations that come with that compiler (but not VC++9). Here are the results: TestBoostMakeShared 9.688s 4.12882e+006 allocs/s TestBoostSharedPtrNew 8.252s 4.84731e+006 allocs/s TestStdMakeShared 5.07s 7.88955e+006 allocs/s TestStdSharedPtrNew 8.159s 4.90256e+006 allocs/s While std::shared_ptr(new) performs about the same as boost::shared_ptr(new), std::make_shared really blows away boost::make_shared and both shared_ptr(new) tests, being almost twice as fast as boost::make_shared. I then profiled the boost::make_shared test to see what's the biggest performance bottleneck when compared to boost::shared_ptr(new) profiler run. The culprit was immediately obvious: boost::make_shared test was spending above 25% of its time in "type_info::operator==(class type_info const &) const" function. This function was being called indirectly from boost::make_shared through boost::get_deleter. After digging some more through the implementation I came to the conclusion that, in this particular case, we are guaranteed to always be requesting deleter for the right class (namely T from boost::make_shared<T>). Since boost::shared_ptr doesn't have a way to retrieve the deleter without using RTTI I decided to add one and use it from an alternative boost::make_shared. So I did the following: 1. I added a virtual function to detail::sp_counted_base (detail\sp_counted_base_w32.hpp): virtual void * get_raw_deleter( ) = 0; 2. I implemented get_raw_deleter() function in sp_counted_impl_p (detail\sp_counted_impl.hpp): virtual void * get_raw_deleter( ) { return 0; } 3. I implemented get_raw_deleter() function in sp_counted_impl_pd (detail\sp_counted_impl.hpp): virtual void * get_raw_deleter( ) { return &reinterpret_cast<char&>( del ); } 4. I implemented get_raw_deleter() function in sp_counted_impl_pda (detail\sp_counted_impl.hpp): virtual void * get_raw_deleter( ) { return &reinterpret_cast<char&>( d_ ); } 5. I added the following function to detail::shared_count: void * get_raw_deleter( ) const { return pi_? pi_->get_raw_deleter( ): 0; } 6. I added the following function to shared_ptr<>: void * _internal_get_raw_deleter( ) const { return pn.get_raw_deleter( ); } 7. I made a separate copy of boost::make_shared function and replaced a single line from: boost::detail::sp_ms_deleter< T > * pd = boost::get_deleter< boost::detail::sp_ms_deleter< T > >( pt ); to: boost::detail::sp_ms_deleter< T > * pd = static_cast<boost::detail::sp_ms_deleter< T > *>(pt._internal_get_raw_deleter()); Benchmarking the results afterwards gave me the following results on VC++9: TestBoostSharedPtrNew 9.204s 4.34594e+006 allocs/s TestBoostMakeShared 10.499s 3.80989e+006 allocs/s TestBoostMakeSharedAlt 7.831s 5.1079e+006 allocs/s My changes translated into almost 35% improvement in allocation speed over the current implementation of boost::make_shared. Or to put it differently, they amount to 25+% decrease in running time as we could have supposed from the profiling results. Results on VC++10 are similar: TestBoostSharedPtrNew 8.487s 4.71309e+006 allocs/s TestBoostMakeShared 9.609s 4.16276e+006 allocs/s TestStdSharedPtrNew 8.283s 4.82917e+006 allocs/s TestStdMakeShared 5.039s 7.93808e+006 allocs/s TestBoostMakeSharedAlt 6.802s 5.88062e+006 allocs/s VC++10's std::make_shared is still much faster (almost 35% faster than boost::shared_ptr) and we will be switching to it once we switch to VC++10. But in the meantime it seems to me that boost::make_shared should be fixed to improve the performance. Again, this is only one compiler and other compilers might not have such a severe RTTI performance issue but I still think it would be well worth avoiding unnecessary calls to RTTI during performance-relevant operations such as heap allocations. The testing and changes were done on Boost 1.48.0 but I compared Smart Ptr library sources with Boost 1.49.0 and the above changes should work there equally well. Thanks, Ivan

I've discovered the same problem a year ago and got own fix for it. It outperforms both old boost::make_shared and std::make_shared (just a bit for the last one). I used implementation of VC's make_shared as base. If someone interested I can send patch, but I don't know the correct process for this. Thanks 25 квітня 2012 р. 12:48 Ivan Erceg <ierceg@gmail.com> написав:
Hi all,
Before I switch to using boost::make_shared<> I wanted to test its purported performance advantage. I created a simple benchmark for measuing raw allocation throughput for 3 classes of different sizes with a common base class (constructors and destructors trivial). The number of allocations was set to 40,000,000 as it was roughly giving me 10 seconds running time per test.
Unfortunatelly it turns out that on VC++9 (release target with default optimizations) boost::make_shared is significantly slower than simply doing boost::shared_ptr(new). Here's the benchmark output:
TestBoostMakeShared 10.577s 3.78179e+006 allocs/s TestBoostSharedPtrNew 8.907s 4.49085e+006 allocs/s
As you can see boost::make_shared is over 15% slower than boost::shared_ptr(new) idiom.
Having available VC++10 compiler as well I then compared these results with std::shared_ptr and std::make_shared implementations that come with that compiler (but not VC++9). Here are the results:
TestBoostMakeShared 9.688s 4.12882e+006 allocs/s TestBoostSharedPtrNew 8.252s 4.84731e+006 allocs/s TestStdMakeShared 5.07s 7.88955e+006 allocs/s TestStdSharedPtrNew 8.159s 4.90256e+006 allocs/s
While std::shared_ptr(new) performs about the same as boost::shared_ptr(new), std::make_shared really blows away boost::make_shared and both shared_ptr(new) tests, being almost twice as fast as boost::make_shared.
I then profiled the boost::make_shared test to see what's the biggest performance bottleneck when compared to boost::shared_ptr(new) profiler run. The culprit was immediately obvious: boost::make_shared test was spending above 25% of its time in "type_info::operator==(class type_info const &) const" function. This function was being called indirectly from boost::make_shared through boost::get_deleter. After digging some more through the implementation I came to the conclusion that, in this particular case, we are guaranteed to always be requesting deleter for the right class (namely T from boost::make_shared<T>). Since boost::shared_ptr doesn't have a way to retrieve the deleter without using RTTI I decided to add one and use it from an alternative boost::make_shared. So I did the following:
1. I added a virtual function to detail::sp_counted_base (detail\sp_counted_base_w32.hpp):
virtual void * get_raw_deleter( ) = 0;
2. I implemented get_raw_deleter() function in sp_counted_impl_p (detail\sp_counted_impl.hpp):
virtual void * get_raw_deleter( ) { return 0; }
3. I implemented get_raw_deleter() function in sp_counted_impl_pd (detail\sp_counted_impl.hpp):
virtual void * get_raw_deleter( ) { return &reinterpret_cast<char&>( del ); }
4. I implemented get_raw_deleter() function in sp_counted_impl_pda (detail\sp_counted_impl.hpp):
virtual void * get_raw_deleter( ) { return &reinterpret_cast<char&>( d_ ); }
5. I added the following function to detail::shared_count:
void * get_raw_deleter( ) const { return pi_? pi_->get_raw_deleter( ): 0; }
6. I added the following function to shared_ptr<>:
void * _internal_get_raw_deleter( ) const { return pn.get_raw_deleter( ); }
7. I made a separate copy of boost::make_shared function and replaced a single line from:
boost::detail::sp_ms_deleter< T > * pd = boost::get_deleter< boost::detail::sp_ms_deleter< T > >( pt );
to:
boost::detail::sp_ms_deleter< T > * pd = static_cast<boost::detail::sp_ms_deleter< T > *>(pt._internal_get_raw_deleter());
Benchmarking the results afterwards gave me the following results on VC++9:
TestBoostSharedPtrNew 9.204s 4.34594e+006 allocs/s TestBoostMakeShared 10.499s 3.80989e+006 allocs/s TestBoostMakeSharedAlt 7.831s 5.1079e+006 allocs/s
My changes translated into almost 35% improvement in allocation speed over the current implementation of boost::make_shared. Or to put it differently, they amount to 25+% decrease in running time as we could have supposed from the profiling results.
Results on VC++10 are similar:
TestBoostSharedPtrNew 8.487s 4.71309e+006 allocs/s TestBoostMakeShared 9.609s 4.16276e+006 allocs/s TestStdSharedPtrNew 8.283s 4.82917e+006 allocs/s TestStdMakeShared 5.039s 7.93808e+006 allocs/s TestBoostMakeSharedAlt 6.802s 5.88062e+006 allocs/s
VC++10's std::make_shared is still much faster (almost 35% faster than boost::shared_ptr) and we will be switching to it once we switch to VC++10. But in the meantime it seems to me that boost::make_shared should be fixed to improve the performance. Again, this is only one compiler and other compilers might not have such a severe RTTI performance issue but I still think it would be well worth avoiding unnecessary calls to RTTI during performance-relevant operations such as heap allocations.
The testing and changes were done on Boost 1.48.0 but I compared Smart Ptr library sources with Boost 1.49.0 and the above changes should work there equally well.
Thanks, Ivan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, Apr 25, 2012 at 2:02 PM, Yuriy Zubritsky <mt.wizard@gmail.com>wrote:
I've discovered the same problem a year ago and got own fix for it. It outperforms both old boost::make_shared and std::make_shared (just a bit for the last one). I used implementation of VC's make_shared as base. If someone interested I can send patch, but I don't know the correct process for this.
Thanks
25 квітня 2012 р. 12:48 Ivan Erceg <ierceg@gmail.com> написав:
Hi all,
Before I switch to using boost::make_shared<> I wanted to test its purported performance advantage. I created a simple benchmark for measuing raw allocation throughput for 3 classes of different sizes with a common base class (constructors and destructors trivial). The number of allocations was set to 40,000,000 as it was roughly giving me 10 seconds running time per test.
Unfortunatelly it turns out that on VC++9 (release target with default optimizations) boost::make_shared is significantly slower than simply doing boost::shared_ptr(new). Here's the benchmark output:
TestBoostMakeShared 10.577s 3.78179e+006 allocs/s TestBoostSharedPtrNew 8.907s 4.49085e+006 allocs/s
As you can see boost::make_shared is over 15% slower than boost::shared_ptr(new) idiom.
Having available VC++10 compiler as well I then compared these results with std::shared_ptr and std::make_shared implementations that come with that compiler (but not VC++9). Here are the results:
TestBoostMakeShared 9.688s 4.12882e+006 allocs/s TestBoostSharedPtrNew 8.252s 4.84731e+006 allocs/s TestStdMakeShared 5.07s 7.88955e+006 allocs/s TestStdSharedPtrNew 8.159s 4.90256e+006 allocs/s
While std::shared_ptr(new) performs about the same as boost::shared_ptr(new), std::make_shared really blows away boost::make_shared and both shared_ptr(new) tests, being almost twice as fast as boost::make_shared.
I then profiled the boost::make_shared test to see what's the biggest performance bottleneck when compared to boost::shared_ptr(new) profiler run. The culprit was immediately obvious: boost::make_shared test was spending above 25% of its time in "type_info::operator==(class type_info const &) const" function. This function was being called indirectly from boost::make_shared through boost::get_deleter. After digging some more through the implementation I came to the conclusion that, in this particular case, we are guaranteed to always be requesting deleter for the right class (namely T from boost::make_shared<T>). Since boost::shared_ptr doesn't have a way to retrieve the deleter without using RTTI I decided to add one and use it from an alternative boost::make_shared. So I did the following:
1. I added a virtual function to detail::sp_counted_base (detail\sp_counted_base_w32.hpp):
virtual void * get_raw_deleter( ) = 0;
2. I implemented get_raw_deleter() function in sp_counted_impl_p (detail\sp_counted_impl.hpp):
virtual void * get_raw_deleter( ) { return 0; }
3. I implemented get_raw_deleter() function in sp_counted_impl_pd (detail\sp_counted_impl.hpp):
virtual void * get_raw_deleter( ) { return &reinterpret_cast<char&>( del ); }
4. I implemented get_raw_deleter() function in sp_counted_impl_pda (detail\sp_counted_impl.hpp):
virtual void * get_raw_deleter( ) { return &reinterpret_cast<char&>( d_ ); }
5. I added the following function to detail::shared_count:
void * get_raw_deleter( ) const { return pi_? pi_->get_raw_deleter( ): 0; }
6. I added the following function to shared_ptr<>:
void * _internal_get_raw_deleter( ) const { return pn.get_raw_deleter( ); }
7. I made a separate copy of boost::make_shared function and replaced a single line from:
boost::detail::sp_ms_deleter< T > * pd = boost::get_deleter< boost::detail::sp_ms_deleter< T > >( pt );
to:
boost::detail::sp_ms_deleter< T > * pd = static_cast<boost::detail::sp_ms_deleter< T > *>(pt._internal_get_raw_deleter());
Benchmarking the results afterwards gave me the following results on VC++9:
TestBoostSharedPtrNew 9.204s 4.34594e+006 allocs/s TestBoostMakeShared 10.499s 3.80989e+006 allocs/s TestBoostMakeSharedAlt 7.831s 5.1079e+006 allocs/s
My changes translated into almost 35% improvement in allocation speed over the current implementation of boost::make_shared. Or to put it differently, they amount to 25+% decrease in running time as we could have supposed from the profiling results.
Results on VC++10 are similar:
TestBoostSharedPtrNew 8.487s 4.71309e+006 allocs/s TestBoostMakeShared 9.609s 4.16276e+006 allocs/s TestStdSharedPtrNew 8.283s 4.82917e+006 allocs/s TestStdMakeShared 5.039s 7.93808e+006 allocs/s TestBoostMakeSharedAlt 6.802s 5.88062e+006 allocs/s
VC++10's std::make_shared is still much faster (almost 35% faster than boost::shared_ptr) and we will be switching to it once we switch to VC++10. But in the meantime it seems to me that boost::make_shared should be fixed to improve the performance. Again, this is only one compiler and other compilers might not have such a severe RTTI performance issue but I still think it would be well worth avoiding unnecessary calls to RTTI during performance-relevant operations such as heap allocations.
The testing and changes were done on Boost 1.48.0 but I compared Smart Ptr library sources with Boost 1.49.0 and the above changes should work there equally well.
Thanks, Ivan
I don't see any mention of this issue in the trac database, so be sure one of you adds it, preferably with a patch! Even two competing patches is better than none (and maybe better than one). This sounds like a worthy improvement, but I'm not familiar at all with the internals of boost::shared_ptr to know if the present implementation uses RTTI for reasons other than to retrieve the deleter... - Jeff

Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung <at> gmail.com> writes:
I don't see any mention of this issue in the trac database, so be sure one of you adds it
I added ticket #6830 to trac database. Thanks, Ivan

Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
I've discovered the same problem a year ago and got own fix for it. It outperforms both old boost::make_shared and std::make_shared (just a bit for the last one).
I would be very much interested in seeing your fix especially since it's faster than mine. Thanks, Ivan

Sure, I've attached it to ticked # 6829 I created :) Sorry, I had no time to tell about it till now 25 квітня 2012 р. 15:45 Ivan Erceg <ierceg@gmail.com> написав:
Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
I've discovered the same problem a year ago and got own fix for it. It outperforms both old boost::make_shared and std::make_shared (just a bit for the last one).
I would be very much interested in seeing your fix especially since it's faster than mine.
Thanks, Ivan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
Sure, I've attached it to ticked # 6829 I created :) Sorry, I had no time to tell about it till now
:) Thanks. I have adapted your code to be more in SmartPtr style and added #defines leaving non-MSVC part intact. It certainly works much faster though std::make_shared on VC10 is still a bit faster on my box (less than 5% though). Once I test my changes and make sure I got all the repetitive code right I'll attach it to your ticket. Thanks, Ivan

and added #defines leaving non-MSVC part intact This isn't needed. I wrote it to be cross-platform, and it also implements optimization Stephen wrote about ("we know where you live"). I tested my code on msvc 9 - 11, gcc 4.6, 4.7 and clang.
26 квітня 2012 р. 06:17 Ivan Erceg <ierceg@gmail.com> написав:
Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
Sure, I've attached it to ticked # 6829 I created :) Sorry, I had no time to tell about it till now
:) Thanks. I have adapted your code to be more in SmartPtr style and added #defines leaving non-MSVC part intact. It certainly works much faster though std::make_shared on VC10 is still a bit faster on my box (less than 5% though).
Once I test my changes and make sure I got all the repetitive code right I'll attach it to your ticket.
Thanks, Ivan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
and added #defines leaving non-MSVC part intact This isn't needed. I wrote it to be cross-platform, and it also implements optimization Stephen wrote about ("we know where you live"). I tested my code on msvc 9 - 11, gcc 4.6, 4.7 and clang.
I figured as much re cross-platform but I don't have anything to test with except VC9/10. I'll drop the related #defines and attach the updated smart_ptr\make_shared.hpp to your ticket. Thanks, Ivan

I looked into make_shared.hpp you attached, and saw several bug in it. Do you know how to remove it from ticket? Here are the bugs: 1. verify_allocation should check passed pointer only in no-exception mode. If exceptions are enabled, allocator::allocate is required to throw bad_alloc, so no check is needed 2. sp_enable_shared_from_this() is called from shared_ptr constructor, so there's no need to call it in make_shared() 3. I made make_shared.hpp to look differently as it was rewritten. No need to make it looking same way. Both #1 and #2 decrease performance of make_shared a little. Also, #1 prevents it from correct handling of allocation failure if exceptions are disabled I recommend you to use original version from patch Thanks 2012/4/26 Ivan Erceg <ierceg@gmail.com>
Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
and added #defines leaving non-MSVC part intact This isn't needed. I wrote it to be cross-platform, and it also
implements
optimization Stephen wrote about ("we know where you live"). I tested my code on msvc 9 - 11, gcc 4.6, 4.7 and clang.
I figured as much re cross-platform but I don't have anything to test with except VC9/10. I'll drop the related #defines and attach the updated smart_ptr\make_shared.hpp to your ticket.
Thanks, Ivan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Yuriy Zubritsky <mt.wizard <at> gmail.com> writes:
I looked into make_shared.hpp you attached, and saw several bug in it. Do you know how to remove it from ticket?
No, I don't. I tried but couldn't find an option. My apologies for polluting your ticket.
Here are the bugs: 1. verify_allocation should check passed pointer only in no-exception mode. If exceptions are enabled, allocator::allocate is required to throw bad_alloc, so no check is needed
This is my mistake. I have fixed it.
2. sp_enable_shared_from_this() is called from shared_ptr constructor, so there's no need to call it in make_shared()
I see what you mean. I have removed it.
3. I made make_shared.hpp to look differently as it was rewritten. No need to make it looking same way.
I think that lowering the "friction" between what you wrote and the original is desirable as neither you nor I are the library maintainers.
Both #1 and #2 decrease performance of make_shared a little. Also, #1 prevents it from correct handling of allocation failure if exceptions are disabled I recommend you to use original version from patch
Thanks

[Mathias Gaunard]
std::shared_ptr is probably optimized for empty or default deleters.
In VC10 and VC11, shared_ptr has five control blocks: * vanilla * custom deleter * custom deleter and custom allocator * make_shared * allocate_shared Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described in an attempt to get Boost to pick it up), except that we don't special-case empty custom deleters/allocators. In VC11, we've introduced such special-casing for container allocators/comparators, but we haven't had time to extend it to shared_ptr. I've made a note to myself to do this in the future. (VC9 SP1 was so long ago - as I recall, it had the first three control blocks, but lacked make_shared/allocate_shared due to the absence of rvalue references.) Stephan T. Lavavej Visual C++ Libraries Developer

on Wed Apr 25 2012, "Stephan T. Lavavej" <stl-AT-exchange.microsoft.com> wrote:
[Mathias Gaunard]
std::shared_ptr is probably optimized for empty or default deleters.
In VC10 and VC11, shared_ptr has five control blocks:
* vanilla * custom deleter * custom deleter and custom allocator * make_shared * allocate_shared
Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described
I can't find that description. Pointer please?
in an attempt to get Boost to pick it up), except that we don't special-case empty custom deleters/allocators.
You don't need a special case for emptiness if you make them base classes when they're classes (a.k.a. use boost::compressed_pair)
In VC11, we've introduced such special-casing for container allocators/comparators, but we haven't had time to extend it to shared_ptr. I've made a note to myself to do this in the future.
That's what compressed_pair is for; you don't have to keep repeating the same special case over and over ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thursday, April 26, 2012 01:33 PM, Dave Abrahams wrote:
on Wed Apr 25 2012, "Stephan T. Lavavej"<stl-AT-exchange.microsoft.com> wrote:
Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described
I can't find that description. Pointer please?
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec...

[STL]
Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described
[Dave Abrahams]
I can't find that description. Pointer please?
See http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec... (which also has links to my slides - viewable online even without PowerPoint), in particular Slide 6. I described the optimization in detail, but without code, so you can pick it up without reading my sources. Basically, the "traditional" control blocks need to keep a pointer to the object so they can delete it (because of conversions, the shared_ptr's own pointer cannot be used for this purpose - it could have the wrong address with no way to restore it at runtime). But if you're willing to write dedicated control blocks for make_shared/allocate_shared (instead of trying to stuff the object in a special "deleter"), then they can just destroy the object in place. "We know where you live", so we don't need to store a pointer to the object. I was surprised when I learned that I was apparently the first one to implement this - I just assumed that everyone would write it this way. See slide 7 for measurements - picking up this optimization should save you 8 bytes on x86 and 16 bytes (!!!) on x64. That's per object, so if you have a lot of them, it adds up. Consider it a gift - my thanks for all of the wonderful things that Boost has given TR1/C++11.
You don't need a special case for emptiness if you make them base classes when they're classes (a.k.a. use boost::compressed_pair)
I should look into implementing that from scratch in VC12. Sometimes I dream about making the STL dependent on Boost, and causing the universe to recursively implode. It would be fun! STL

On 4/25/2012 11:12 PM, Stephan T. Lavavej wrote:
Sometimes I dream about making the STL dependent on Boost, and causing the universe to recursively implode.
You have strange dreams. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Stephan T. Lavavej wrote:
I was surprised when I learned that I was apparently the first one to implement this - I just assumed that everyone would write it this way.
The original implementation of boost::make_shared was completely non-intrusive. I don't think that there wasn't any real need for that, I just felt like it. :-)

Stephan T. Lavavej wrote:
I was surprised when I learned that I was apparently the first one to implement this - I just assumed that everyone would write it this way.
The original implementation of boost::make_shared was completely non-intrusive. I don't think that there wasn't any real need for that, I
Was. I don't think that there was.
just felt like it. :-)

[STL]
I was surprised when I learned that I was apparently the first one to implement this - I just assumed that everyone would write it this way.
[Peter Dimov]
The original implementation of boost::make_shared was completely non-intrusive. I don't think that there was any real need for that, I just felt like it. :-)
That's reasonable for an initial implementation, of course. But because only shared_ptr's author can implement control blocks, if make_shared doesn't get its own control block, users can't implement this optimization themselves. So boost::make_shared's implementation should be changed now. :-> STL

on Thu Apr 26 2012, "Stephan T. Lavavej" <stl-AT-exchange.microsoft.com> wrote:
[STL]
Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described
[Dave Abrahams]
I can't find that description. Pointer please?
See http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec... (which also has links to my slides - viewable online even without PowerPoint), in particular Slide 6.
9:30 or so in the video.
I described the optimization in detail, but without code, so you can pick it up without reading my sources. Basically, the "traditional" control blocks need to keep a pointer to the object so they can delete it (because of conversions, the shared_ptr's own pointer cannot be used for this purpose - it could have the wrong address with no way to restore it at runtime). But if you're willing to write dedicated control blocks for make_shared/allocate_shared (instead of trying to stuff the object in a special "deleter"), then they can just destroy the object in place. "We know where you live", so we don't need to store a pointer to the object.
I was surprised when I learned that I was apparently the first one to implement this - I just assumed that everyone would write it this way.
Me too :-)
See slide 7 for measurements - picking up this optimization should save you 8 bytes on x86 and 16 bytes (!!!) on x64. That's per object, so if you have a lot of them, it adds up.
Yep.
Consider it a gift - my thanks for all of the wonderful things that Boost has given TR1/C++11.
Thanks kindly.
You don't need a special case for emptiness if you make them base classes when they're classes (a.k.a. use boost::compressed_pair)
I should look into implementing that from scratch in VC12. Sometimes I dream about making the STL dependent on Boost, and causing the universe to recursively implode. It would be fun!
Yeah, who needs an LHC to destroy the universe as we know it? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 26/04/12 19:41, Dave Abrahams wrote:
on Thu Apr 26 2012, "Stephan T. Lavavej"<stl-AT-exchange.microsoft.com> wrote:
[STL]
Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described
[Dave Abrahams]
I can't find that description. Pointer please?
See http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec... (which also has links to my slides - viewable online even without PowerPoint), in particular Slide 6.
9:30 or so in the video.
Explicit destructor calls on memory that doesn't come from the free store? Sounds like there are potential strict aliasing problems there.

On Thu, Apr 26, 2012 at 1:06 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
On 26/04/12 19:41, Dave Abrahams wrote:
on Thu Apr 26 2012, "Stephan T. Lavavej"<stl-AT-exchange.**microsoft.com<http://stl-AT-exchange.microsoft.com>> wrote:
[STL]
Each is optimally sized (in particular, make_shared/allocate_shared implement the "we know where you live" optimization that I have previously described
[Dave Abrahams]
I can't find that description. Pointer please?
See http://channel9.msdn.com/**Events/GoingNative/** GoingNative-2012/STL11-Magic-**Secrets<http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Secrets> (which also has links to my slides - viewable online even without PowerPoint), in particular Slide 6.
9:30 or so in the video.
Explicit destructor calls on memory that doesn't come from the free store?
Sounds like there are potential strict aliasing problems there.
Doesn't the same thing happen in boost::optional and boost::variant? - Jeff

On 26/04/12 22:08, Jeffrey Lee Hellrung, Jr. wrote:
Doesn't the same thing happen in boost::optional and boost::variant?
Yes, their code is arguably ill-formed, but I'm sure some people claim otherwise. AFAIK the only way to implement this in a standard conforming way is with C++11 unrestricted unions. Alternatively, all compilers that implement strict aliasing provide attributes to allow aliasing locally.

On Thursday 26 April 2012 22:26:24 Mathias Gaunard wrote:
On 26/04/12 22:08, Jeffrey Lee Hellrung, Jr. wrote:
Doesn't the same thing happen in boost::optional and boost::variant?
Yes, their code is arguably ill-formed, but I'm sure some people claim otherwise.
AFAIK the only way to implement this in a standard conforming way is with C++11 unrestricted unions. Alternatively, all compilers that implement strict aliasing provide attributes to allow aliasing locally.
I think, std::aligned_storage is intended for this.

Mathias Gaunard wrote:
On 26/04/12 22:08, Jeffrey Lee Hellrung, Jr. wrote:
Doesn't the same thing happen in boost::optional and boost::variant?
Yes, their code is arguably ill-formed, but I'm sure some people claim otherwise.
No, it's not. Provide the standard text that, in your opinion, makes them ill-formed.

On 26/04/12 23:08, Peter Dimov wrote:
Mathias Gaunard wrote:
On 26/04/12 22:08, Jeffrey Lee Hellrung, Jr. wrote:
Doesn't the same thing happen in boost::optional and boost::variant?
Yes, their code is arguably ill-formed, but I'm sure some people claim otherwise.
No, it's not. Provide the standard text that, in your opinion, makes them ill-formed.
3.10/10. If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: — the dynamic type of the object, — a cv-qualified version of the dynamic type of the object, — a type similar (as defined in 4.4) to the dynamic type of the object, — a type that is the signed or unsigned type corresponding to the dynamic type of the object, — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, — an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union), — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, — a char or unsigned char type. I suppose you probably want to say that the type you're accessing the object with is actually the dynamic type of the object, but that idea is not supported anywhere in the standard. The only mechanism that supports the necessary logic for optional and variant are the new unrestricted unions, where explicit destructor and placement new calls effectively change the active member of the union.

Mathias Gaunard wrote:
I suppose you probably want to say that the type you're accessing the object with is actually the dynamic type of the object, but that idea is not supported anywhere in the standard.
3.8, object lifetime.

On 27/04/12 01:59, Peter Dimov wrote:
Mathias Gaunard wrote:
I suppose you probably want to say that the type you're accessing the object with is actually the dynamic type of the object, but that idea is not supported anywhere in the standard.
3.8, object lifetime.
In lack of a proper explanation, I will assume that your argument is that the array member lifetime has ended and you're re-using its storage to store an object of a different type. Is that what you meant? Assuming this is true (which I'm not entirely sure of, the fact that it is a subobject might be a problem), there still is 3.8/7 to consider, which does not allow you to re-use the storage for any other type if there is any reference to the original object. I don't see how the copy constructor of the class could avoid referring to the original object.

Mathias Gaunard wrote:
On 27/04/12 01:59, Peter Dimov wrote:
Mathias Gaunard wrote:
I suppose you probably want to say that the type you're accessing the object with is actually the dynamic type of the object, but that idea is not supported anywhere in the standard.
3.8, object lifetime.
In lack of a proper explanation, I will assume that your argument is that the array member lifetime has ended and you're re-using its storage to store an object of a different type.
Is that what you meant?
Assuming this is true (which I'm not entirely sure of, the fact that it is a subobject might be a problem), there still is 3.8/7 to consider, which does not allow you to re-use the storage for any other type if there is any reference to the original object.
I don't see how the copy constructor of the class could avoid referring to the original object.
Interesting point. In make_shared's case, there's no need for the class to have a copy constructor. However, you may be right that one can't just create a new object in the middle of an existing object, under a very strict reading.

on Sat Apr 28 2012, Mathias Gaunard <mathias.gaunard-AT-ens-lyon.org> wrote:
On 27/04/12 01:59, Peter Dimov wrote:
Mathias Gaunard wrote:
I suppose you probably want to say that the type you're accessing the object with is actually the dynamic type of the object, but that idea is not supported anywhere in the standard.
3.8, object lifetime.
In lack of a proper explanation, I will assume that your argument is that the array member lifetime has ended and you're re-using its storage to store an object of a different type.
That sounds about right.
Is that what you meant?
Assuming this is true (which I'm not entirely sure of, the fact that it is a subobject might be a problem), there still is 3.8/7 to consider, which does not allow you to re-use the storage for any other type if there is any reference to the original object.
I don't see anything that I can interpret as "if there is any reference to the original object" in 3.8/7 But I read "is" to mean "exists" in your statement. It's possible that you meant "occurs"
I don't see how the copy constructor of the class could avoid referring to the original object.
It needn't refer to the original sub-object, which is the thing whose lifetime has ended and whose storage has been reused. I'm pretty certain it's possible to implement optional while staying strictly within the letter of the standard. I'm also pretty certain that the intent of the standard is to allow it... and intent trumps the actual wording in matters like this. However, to be sure of the intent requires consulting with CWG, which I will do next. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thu, Apr 26, 2012 at 4:45 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
On 26/04/12 23:08, Peter Dimov wrote:
Mathias Gaunard wrote:
On 26/04/12 22:08, Jeffrey Lee Hellrung, Jr. wrote:
Doesn't the same thing happen in boost::optional and boost::variant?
Yes, their code is arguably ill-formed, but I'm sure some people claim otherwise.
No, it's not. Provide the standard text that, in your opinion, makes them ill-formed.
3.10/10.
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
— the dynamic type of the object, — a cv-qualified version of the dynamic type of the object, — a type similar (as defined in 4.4) to the dynamic type of the object, — a type that is the signed or unsigned type corresponding to the dynamic type of the object, — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object, — an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union), — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, — a char or unsigned char type.
I suppose you probably want to say that the type you're accessing the object with is actually the dynamic type of the object, but that idea is not supported anywhere in the standard.
You can safely static_cast a void pointer to a pointer of the dynamic type of the object the void pointer points. Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

On 27/04/12 02:07, Emil Dotchevski wrote:
You can safely static_cast a void pointer to a pointer of the dynamic type of the object the void pointer points.
I don't see how that's relevant. There is no provision in the standard to support the idea that the dynamic type of the array subobject in optinal<T> is T.

[Mathias Gaunard]
Explicit destructor calls on memory that doesn't come from the free store? Sounds like there are potential strict aliasing problems there.
First, that would be perfectly cromulent on the stack (as long as you're using aligned_storage to ward off alignment demons). According to my understanding, VC doesn't aggressively optimize based on the strict aliasing rules like GCC does, so I have a relatively poor understanding of the issues involved, but sufficient casting (in particular, static_cast to void * and then static_cast to T *) will ensure correctness. Second, make_shared/allocate_shared dynamically allocate their control blocks, and then construct the object within the control block. This happens even for implementations that store the object within a custom deleter within the control block. (And it is very similar to how vector explicitly constructs and explicitly destroys elements.) STL

on Thu Apr 26 2012, Mathias Gaunard <mathias.gaunard-AT-ens-lyon.org> wrote:
On 26/04/12 19:41, Dave Abrahams wrote:
on Thu Apr 26 2012, "Stephan T. Lavavej"<stl-AT-exchange.microsoft.com> wrote:
[STL]
Each is optimally sized (in particular, make_shared/allocate_shared
implement the "we know where you live" optimization that I have previously described
[Dave Abrahams]
I can't find that description. Pointer please?
See http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/STL11-Magic-Sec... (which also has links to my slides - viewable online even without PowerPoint), in particular Slide 6.
9:30 or so in the video.
Explicit destructor calls on memory that doesn't come from the free store? Sounds like there are potential strict aliasing problems there.
I think, based on what you said here and your comments in the newsgroup, that you are misreading those rules. You're allowed to construct anything you want in a raw array of char as long as it's properly aligned. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

You don't need a special case for emptiness if you make them base classes when they're classes (a.k.a. use boost::compressed_pair)
In VC11, we've introduced such special-casing for container allocators/comparators, but we haven't had time to extend it to shared_ptr. I've made a note to myself to do this in the future.
That's what compressed_pair is for; you don't have to keep repeating the same special case over and over ;-)
The downside of having it be a base class is that classes can now be marked final, which means they can't be used as base classes. This approach was used by libc++ for implementing the same optimization that STL mentions VC's STL implements, but I think they end up falling back to storing it if it's empty and final. I suppose what we really want is static if. :)

On 29/04/12 03:16, Ahmed Charles wrote:
The downside of having it be a base class is that classes can now be marked final, which means they can't be used as base classes. This approach was used by libc++ for implementing the same optimization that STL mentions VC's STL implements, but I think they end up falling back to storing it if it's empty and final.
I suppose what we really want is static if. :)
Everything you could do with static if you can already do without, just with more verbose syntax. In this particular case, however, I don't see how any if-like technique would help. I cannot think of a method to tell whether a type is empty without inheriting from it.

On 30.04.2012, at 18:51, Mathias Gaunard wrote:
On 29/04/12 03:16, Ahmed Charles wrote:
The downside of having it be a base class is that classes can now be marked final, which means they can't be used as base classes. This approach was used by libc++ for implementing the same optimization that STL mentions VC's STL implements, but I think they end up falling back to storing it if it's empty and final.
I suppose what we really want is static if. :)
Everything you could do with static if you can already do without, just with more verbose syntax.
In this particular case, however, I don't see how any if-like technique would help. I cannot think of a method to tell whether a type is empty without inheriting from it.
std::is_empty, of course. The implementation is up to the standard library - Clang provides an __is_empty trait intrinsic since we discovered the final class problem of the classic library implementation. Sebastian

The downside of having it be a base class is that classes can now be marked final, which means they can't be used as base classes. This approach was used by libc++ for implementing the same optimization that STL mentions VC's STL implements, but I think they end up falling back to storing it if it's empty and final.
I suppose what we really want is static if. :)
Everything you could do with static if you can already do without, just with more verbose syntax.
In this particular case, however, I don't see how any if-like technique would help. I cannot think of a method to tell whether a type is empty without inheriting from it.
But even if there was a way to tell, how would you use an object like that without aggregating it (and thus having it take up space) or inheriting from it (which is not allowed)? Regards, Nate

On 30/04/12 22:22, Mathias Gaunard wrote:
On 30/04/12 22:47, Nathan Ridge wrote:
But even if there was a way to tell, how would you use an object like that without aggregating it (and thus having it take up space) or inheriting from it (which is not allowed)?
You would construct it whenever you need it.
By my reading, neither a shared_ptr deleter nor a container allocator is required to be DefaultConstructible in general. John Bytheway
participants (15)
-
Ahmed Charles
-
Andrey Semashev
-
Ben Pope
-
Dave Abrahams
-
Emil Dotchevski
-
Eric Niebler
-
Ivan Erceg
-
Jeffrey Lee Hellrung, Jr.
-
John Bytheway
-
Mathias Gaunard
-
Nathan Ridge
-
Peter Dimov
-
Sebastian Redl
-
Stephan T. Lavavej
-
Yuriy Zubritsky