Re: [boost] "Small buffer optimization" for boost::function?

29 Dec 2005

      On Dec 28, 2005, at 4:15 AM, Alex Besogonov wrote:
...
Vladimir Prus wrote:
...
...
I don't have any problem with boost::function speed of invocations
(though FastDelegate is two times faster).
I see. Still, would be nice to see specific numbers.
I've attached a test program (you need FastDelegate from http:// 
www.codeproject.com/cpp/FastDelegate.asp to compile it).
Results:
=========================
C:\temp\delegates>gcc -O3 -funroll-loops -fomit-frame-pointer  
test.cpp -Ic:/tools/boost -lstdc++
C:\temp\delegates>a.exe
Time elapsed for FastDelegate: 1.191000 (sec)
Time elapsed for simple bind: 0.010000 (sec)
Time elapsed for bind+function: 33.118000 (sec)
Time elapsed for pure function invocation: 3.705000 (sec)
=========================
(GCC 4.1.0 was used)
You can see that boost::function + boost::bind is an order of  
magnitude slower than FastDelegate. Even a mere invocation of a  
boost::function is slower than complete bind+invoke for FastDelegate.
The major performance problem in this example is the memory  
allocation required to construct boost::function objects. We could  
implement SBO directly in boost::function, but we're trading off  
space and performance. Is it worth it? It depends on how often you  
copy boost::function objects vs. how many of them you store in memory.

Would a pooling allocator solve the problem? I tried switching the  
boost::function<> allocator to boost::pool_allocator and  
boost::fast_pool_allocator (from the Boost.Pool library), but  
performance actually got quite a bit worse with this change:

Time elapsed for simple bind: 2.050000 (sec)
Time elapsed for bind+function: 43.120000 (sec)
Time elapsed for pure function invocation: 2.020000 (sec)
Time elapsed for bind+function+pool: 130.750000 (sec)
Time elapsed for bind+function+fastpool: 108.590000 (sec)

Pooling is not feasible, so we need the SBO for performance, but not  
all users can take the increase in boost::function size. On non- 
broken compilers, we could use the Allocator parameter to implement  
the SBO. At first I was hoping we could just make boost::function  
smart enough to handle stateful allocators, then write an SBO  
allocator. Unfortunately, this doesn't play well with rebinding:

template<typename Signature, typename Allocator>
class function : Allocator
{
public:
   template<typename F>
   function(const F& f)
   {
     typedef typename Allocator::template rebind<F>::other my_allocator;
     my_allocator alloc(*this);
     F* new_F = alloc.allocate(1); // where does this point to?
     // ...
   }
};

Presumably, a SBO allocator's allocate() member would return a  
pointer into it's own buffer, but what happens when you rebind for  
the new type F and then allocate() using that rebound allocator? You  
get a pointer to the wrong buffer.

So the SBO needs to be more deeply ingrained in boost::function. The  
common case on most 32-bit architectures is an 8-byte member function  
pointer and a 4-byte object pointer, so we need 12 bytes of storage  
to start with for the buffer; boost::function is currently only 12  
bytes (4 bytes of that is the buffer). boost::function adds to this  
the "manager" and "invoker" pointers, which would bring us to 20  
bytes in the SBO case. But, we can collapse the manager and invoker  
into a single vtable pointer, so we'd get back down to 16 bytes.  
Still larger than before, but that 4-byte overhead could drastically  
improve performance for many common cases. I'm okay with that. We'll  
probably have to give up the no-throw swap guarantee, and perhaps  
also the strong exception safety of copying boost::function objects,  
but I don't think anyone will care about those. The basic guarantee  
is good enough.

	Doug

Re: [boost] "Small buffer optimization" for boost::function?

Douglas Gregor