shared_ptr/shared_count and the cost of use counter allocation

Hi, I have taken the task to replace a proprietary smart pointer (a fork of an earlier 'shared_ptr' variant) used in a library used for rendering of 3D scenes by the boost's current 'shared_ptr'. The reason is that I have found a couple of syntactic problems with the proprietary smart pointer that 'shared_ptr' doesn't have. While I succeeded syntactically, in certain performance tests the proprietary smart pointer performs about 20 times better than 'shared_ptr'. It seems that this is caused by the allocation of the use counter. For the proprietary smart pointer this allocation can be tweaked by passing allocators as template arguments, while 'shared_ptr' seems not to allow this. I have yet to find out what that performance difference means for real projects, but according to those familiar with the library the reason the allocators where introduced is that there's code that very extensively creates and destroys smart pointers which suffered badly from allocation costs. Is there a way for me to tweak 'shared_ptr's performance other than by again forking into a proprietary smart pointer and introducing allocators for that one? or is there any interest in the boost community for doing this? Thanks, Schobi

Hendrik Schober a écrit :
For the proprietary smart pointer this allocation can be tweaked by passing allocators as template arguments, while 'shared_ptr' seems not to allow this. The use of make_shared should reduce the allocation cost. There is also a version of the constructor that takes an allocator as argument.
I have no idea which of those options would have the best impact. -- Loïc

On Mon, Sep 8, 2008 at 11:04 AM, Hendrik Schober <spamtrap@gmx.de> wrote:
For the proprietary smart pointer this allocation can be tweaked by passing allocators as template arguments, while 'shared_ptr' seems not to allow this.
In that aspect shared_ptr is better because it allows you to use custom allocators without affecting the static type of the shared_ptr; in other words, two shared_ptr<foo> objects have the same type (shared_ptr<foo>) regardless whether they've been allocated with the default allocator or not. This makes it possible to create shared_ptr factories which pick the best allocation strategy *per instance* without affecting user code. Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Hi Hendrik ! An'n Montag 08 September 2008 hett Hendrik Schober schreven:
Is there a way for me to tweak 'shared_ptr's performance other than by again forking into a proprietary smart pointer and introducing allocators for that one? or is there any interest in the boost community for doing this?
just #define BOOST_SP_USE_QUICK_ALLOCATOR for some speed optimisations. This will use a pool for allocating the shared count. Grep the shared_ptr sources for more information. Hope this helps... Yours, Jürgen -- * Dipl.-Math. Jürgen Hunold ! Ingenieurgesellschaft für * voice: ++49 511 262926 57 ! Verkehrs- und Eisenbahnwesen mbH * fax : ++49 511 262926 99 ! Lister Straße 15 * juergen.hunold@ivembh.de ! www.ivembh.de * * Geschäftsführer: ! Sitz des Unternehmens: Hannover * Prof. Dr.-Ing. Thomas Siefer ! Amtsgericht Hannover, HRB 56965 * PD Dr.-Ing. Alfons Radtke !

Hendrik Schober wrote:
[...]
Thanks everybody for their answers. I hadn't seen the possibility to pass custom allocators. In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used. So I'll go ahead and test this on the other platforms we need. If it works just as well there, I'll merge my changes into the trunk. Thanks for the great work! Schobi

On Wed, Sep 10, 2008 at 2:29 AM, Hendrik Schober <spamtrap@gmx.de> wrote:
In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used. So I'll go ahead and test this on the other platforms we need. If it works just as well there, I'll merge my changes into the trunk.
Also, make_shared could be used to get rid of the separate allocation for the count. This may make it unnecessary to define BOOST_SP_USE_QUICK_ALLOCATOR. Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Emil Dotchevski wrote:
On Wed, Sep 10, 2008 at 2:29 AM, Hendrik Schober <spamtrap@gmx.de> wrote:
In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used. So I'll go ahead and test this on the other platforms we need. If it works just as well there, I'll merge my changes into the trunk.
Also, make_shared could be used to get rid of the separate allocation for the count. This may make it unnecessary to define BOOST_SP_USE_QUICK_ALLOCATOR.
And if its implemented the way I would imagine it would be it should also improve locality of reference and minimize false sharing when when working with your shared pointers which should improve performance especially on multi-core systems with multi-threaded programs. If the sp pool is implemented in a typical fashion you'd probably get a lot of false sharing when working with shared_ptrs and maybe even hurt performance over the naive allocator. Of course this is all just conjecture on my part I haven't looked at the code. Thanks, Michael Marcin

Emil Dotchevski:
On Wed, Sep 10, 2008 at 2:29 AM, Hendrik Schober <spamtrap@gmx.de> wrote:
In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used. So I'll go ahead and test this on the other platforms we need. If it works just as well there, I'll merge my changes into the trunk.
Also, make_shared could be used to get rid of the separate allocation for the count.
There is also allocate_shared, which is the same as make_shared, but with an allocator parameter.

Hendrik Schober wrote:
Hendrik Schober wrote:
[...]
Thanks everybody for their answers. I hadn't seen the possibility to pass custom allocators. In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used. So I'll go ahead and test this on the other platforms we need. If it works just as well there, I'll merge my changes into the trunk.
IIRC the pool of sp counted objects used won't be cleaned up until the OS reclaims the memory at program termination. If you run any sort of leak detector this can be annoying. Thanks, Michael Marcin

Hi Michael ! On Wednesday 10 September 2008 18:26:38 Michael Marcin wrote:
IIRC the pool of sp counted objects used won't be cleaned up until the OS reclaims the memory at program termination. If you run any sort of leak detector this can be annoying.
Yes, this is the reason why I have special build variant for running memory checks using valgrind :-)) When using valgrind, the speedup of BOOST_SP_QUICK_ALLOCATOR is neglible. But valgrind reports the pool and the allocated shared counts as "leaks". So turn it off when using valgrind. Yours, Jürgen -- * Dipl.-Math. Jürgen Hunold ! Ingenieurgesellschaft für * voice: ++49 511 262926 57 ! Verkehrs- und Eisenbahnwesen mbH * fax : ++49 511 262926 99 ! Lister Straße 15 * juergen.hunold@ivembh.de ! www.ivembh.de * * Geschäftsführer: ! Sitz des Unternehmens: Hannover * Prof. Dr.-Ing. Thomas Siefer ! Amtsgericht Hannover, HRB 56965 * PD Dr.-Ing. Alfons Radtke !

Hendrik Schober wrote:
In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used.
Presumably this is the "special purpose allocator" mentioned on this page or something like it: http://www.boost.org/doc/libs/1_36_0/libs/smart_ptr/smarttests.htm However, I don't find any mention of BOOST_SP_USE_QUICK_ALLOCATOR in the documentation. Have I missed something? Is there a reason for not making BOOST_SP_USE_QUICK_ALLOCATOR the default? Phil.

Hi Phil ! On Wednesday 10 September 2008 18:39:21 Phil Endecott wrote:
Hendrik Schober wrote:
In the end I just #define'd 'BOOST_SP_USE_QUICK_ALLOCATOR' and on my platform that was faster than 3 out of 4 of the proprietary allocators used.
Presumably this is the "special purpose allocator" mentioned on this page or something like it:
http://www.boost.org/doc/libs/1_36_0/libs/smart_ptr/smarttests.htm
However, I don't find any mention of BOOST_SP_USE_QUICK_ALLOCATOR in the documentation. Have I missed something?
mmh. might be. I'm using it for _years_ , so I don't know where I got the information from.
Is there a reason for not making BOOST_SP_USE_QUICK_ALLOCATOR the default?
See my reply to Michael. The pool is released by the OS on program termination, which will lead to false positives reported by most memory checkers. And I guess those would generate more traffic on the users list ;-)) Yours, Jürgen -- * Dipl.-Math. Jürgen Hunold ! Ingenieurgesellschaft für * voice: ++49 511 262926 57 ! Verkehrs- und Eisenbahnwesen mbH * fax : ++49 511 262926 99 ! Lister Straße 15 * juergen.hunold@ivembh.de ! www.ivembh.de * * Geschäftsführer: ! Sitz des Unternehmens: Hannover * Prof. Dr.-Ing. Thomas Siefer ! Amtsgericht Hannover, HRB 56965 * PD Dr.-Ing. Alfons Radtke !
participants (8)
-
Emil Dotchevski
-
Hendrik Schober
-
Juergen Hunold
-
Jürgen Hunold
-
Loïc Joly
-
Michael Marcin
-
Peter Dimov
-
Phil Endecott