
On Dec 28, 2005, at 4:15 AM, Alex Besogonov wrote:
Vladimir Prus wrote:
I don't have any problem with boost::function speed of invocations (though FastDelegate is two times faster). I see. Still, would be nice to see specific numbers. I've attached a test program (you need FastDelegate from http:// www.codeproject.com/cpp/FastDelegate.asp to compile it).
Results: ========================= C:\temp\delegates>gcc -O3 -funroll-loops -fomit-frame-pointer test.cpp -Ic:/tools/boost -lstdc++ C:\temp\delegates>a.exe Time elapsed for FastDelegate: 1.191000 (sec) Time elapsed for simple bind: 0.010000 (sec) Time elapsed for bind+function: 33.118000 (sec) Time elapsed for pure function invocation: 3.705000 (sec) ========================= (GCC 4.1.0 was used)
You can see that boost::function + boost::bind is an order of magnitude slower than FastDelegate. Even a mere invocation of a boost::function is slower than complete bind+invoke for FastDelegate.
The major performance problem in this example is the memory allocation required to construct boost::function objects. We could implement SBO directly in boost::function, but we're trading off space and performance. Is it worth it? It depends on how often you copy boost::function objects vs. how many of them you store in memory. Would a pooling allocator solve the problem? I tried switching the boost::function<> allocator to boost::pool_allocator and boost::fast_pool_allocator (from the Boost.Pool library), but performance actually got quite a bit worse with this change: Time elapsed for simple bind: 2.050000 (sec) Time elapsed for bind+function: 43.120000 (sec) Time elapsed for pure function invocation: 2.020000 (sec) Time elapsed for bind+function+pool: 130.750000 (sec) Time elapsed for bind+function+fastpool: 108.590000 (sec) Pooling is not feasible, so we need the SBO for performance, but not all users can take the increase in boost::function size. On non- broken compilers, we could use the Allocator parameter to implement the SBO. At first I was hoping we could just make boost::function smart enough to handle stateful allocators, then write an SBO allocator. Unfortunately, this doesn't play well with rebinding: template<typename Signature, typename Allocator> class function : Allocator { public: template<typename F> function(const F& f) { typedef typename Allocator::template rebind<F>::other my_allocator; my_allocator alloc(*this); F* new_F = alloc.allocate(1); // where does this point to? // ... } }; Presumably, a SBO allocator's allocate() member would return a pointer into it's own buffer, but what happens when you rebind for the new type F and then allocate() using that rebound allocator? You get a pointer to the wrong buffer. So the SBO needs to be more deeply ingrained in boost::function. The common case on most 32-bit architectures is an 8-byte member function pointer and a 4-byte object pointer, so we need 12 bytes of storage to start with for the buffer; boost::function is currently only 12 bytes (4 bytes of that is the buffer). boost::function adds to this the "manager" and "invoker" pointers, which would bring us to 20 bytes in the SBO case. But, we can collapse the manager and invoker into a single vtable pointer, so we'd get back down to 16 bytes. Still larger than before, but that 4-byte overhead could drastically improve performance for many common cases. I'm okay with that. We'll probably have to give up the no-throw swap guarantee, and perhaps also the strong exception safety of copying boost::function objects, but I don't think anyone will care about those. The basic guarantee is good enough. Doug