
I've now implemented the small buffer optimization for Boost.Function. The patch is attached, but I have yet to check it in. Here's the executive summary: Performance difference: up to 6x faster when the SBO applies Space difference: boost::function takes an extra 4 bytes (now, it's 16 bytes) Semantics: Assignment operators now give the basic guarantee (was the strong guarantee); swap() can now throw. (We're now less TR1-conforming, but we could claim that the TR is wrong to be so strict). Usability: The optimization won't help much in practice unless Boost.Bind objects become smaller :( I've extended the performance test (attached) with a "smallbind" function object and its tests. The tests that follow use both "smallbind" and "bind", separately, because the former fits in the 12- byte SBO buffer whereas the latter does not. I tested on GCC 4.2.0 (bleeding edge, straight from CVS) and GCC 3.3 (both Apple and FSF) on a Dual G5 running Mac OS X Panther and on an Athlon XP system running Linux. The newer compiler gave us the performance boost we wanted, with about a 6x improvement when the SBO is used. We get more like 2x with GCC 3.3, although I have a trick or two left that may improve things. In the data that follows, there are 3 versions of Boost.Function being tested: 1.33.1: This is Boost.Function as released in Boost 1.33.1. No SBO applied, of course. 1.34.0 w/ vtables: This is Boost.Function as it currently stands in Boost CVS. It uses vtables for a space optimization (a boost::function object requires only 8 bytes of storage), but does not implement the SBO. 1.34.0 w/ tables and SBO: This is Boost.Function in Boost CVS with the attached patch applied. It uses vtables and contains a 12-byte (actually, the size of a member pointer + the size of a void*) buffer for the SBO optimization. Even with the SBO in Boost.Function, users won't immediately realize the benefits. The problem is that Boost.Bind produces function objects whose size is not minimal. For instance, boost::bind (&Test::func, &func, _1) returns a function object that is 16 bytes. That 4 bytes of wasted space doesn't matter most of the time, but here is means the difference between using the SBO and not using the SBO :( So, Peter, any chance of getting a slightly more optimized Boost.Bind that can fit boost::bind(&Test::func, &func, _1) into 12 bytes? Doug On my Athlon XP box ------------------------- OS: Gentoo Linux ("old") Compiler: GCC 3.3.6 Flags: -O3 -funroll-loops -fomit-frame-pointer) [1.33.1] Time elapsed for simple bind: 1.360000 (sec) Time elapsed for smallbind+function (size=12): 10.870000 (sec) Time elapsed for bind+function (size=16): 11.770000 (sec) Time elapsed for pure function invocation: 1.590000 (sec) Time elapsed for bind+function+pool: 29.690000 (sec) Time elapsed for bind+function+fastpool: 6.560000 (sec) [1.34.0 w/ vtables] Time elapsed for simple bind: 1.410000 (sec) Time elapsed for smallbind+function (size=12): 11.260000 (sec) Time elapsed for bind+function (size=16): 12.500000 (sec) Time elapsed for pure function invocation: 1.530000 (sec) Time elapsed for bind+function+pool: 30.360000 (sec) Time elapsed for bind+function+fastpool: 7.370000 (sec) [1.34.0 w/ vtables and SBO] Time elapsed for simple bind: 1.360000 (sec) Time elapsed for smallbind+function (size=12): 5.190000 (sec) Time elapsed for bind+function (size=16): 13.150000 (sec) Time elapsed for pure function invocation: 1.660000 (sec) Time elapsed for bind+function+pool: 29.730000 (sec) Time elapsed for bind+function+fastpool: 7.060000 (sec) On my Athlon XP Linux box ------------------------- OS: Gentoo Linux ("old") Compiler: GCC 4.2.0 (20051122, experimental) Flags: -O3 -funroll-loops -fomit-frame-pointer) [1.33.1] Time elapsed for simple bind: 0.850000 (sec) Time elapsed for smallbind+function (size=12): 14.230000 (sec) Time elapsed for bind+function (size=16): 15.100000 (sec) Time elapsed for pure function invocation: 1.430000 (sec) Time elapsed for bind+function+pool: 26.930000 (sec) Time elapsed for bind+function+fastpool: 9.060000 (sec) [1.34.0 w/ vtables] Time elapsed for simple bind: 0.020000 (sec) Time elapsed for smallbind+function (size=12): 13.410000 (sec) Time elapsed for bind+function (size=16): 13.360000 (sec) Time elapsed for pure function invocation: 1.350000 (sec) Time elapsed for bind+function+pool: 25.570000 (sec) Time elapsed for bind+function+fastpool: 7.590000 (sec) [1.34.0 w/ vtables and SBO] Time elapsed for simple bind: 0.020000 (sec) Time elapsed for smallbind+function (size=12): 2.640000 (sec) Time elapsed for bind+function (size=16): 12.940000 (sec) Time elapsed for pure function invocation: 1.460000 (sec) Time elapsed for bind+function+pool: 25.260000 (sec) Time elapsed for bind+function+fastpool: 7.430000 (sec) On my dual G5 PowerMac ---------------------- OS: Panther (10.3.9) Compiler: Apple GCC 3.3 Flags: -O3 [1.33.1] Time elapsed for simple bind: 2.330000 (sec) Time elapsed for smallbind+function (size=12): 22.080000 (sec) Time elapsed for bind+function (size=16): 29.370000 (sec) Time elapsed for pure function invocation: 2.590000 (sec) Time elapsed for bind+function+pool: 38.810000 (sec) Time elapsed for bind+function+fastpool: 21.460000 (sec) [1.34.0 w/ vtables] Time elapsed for simple bind: 1.180000 (sec) Time elapsed for smallbind+function (size=12): 24.050000 (sec) Time elapsed for bind+function (size=16): 25.860000 (sec) Time elapsed for pure function invocation: 2.590000 (sec) Time elapsed for bind+function+pool: 42.000000 (sec) Time elapsed for bind+function+fastpool: 23.590000 (sec) [1.34.0 w/ vtables and SBO] Time elapsed for simple bind: 1.200000 (sec) Time elapsed for smallbind+function (size=12): 8.140000 (sec) Time elapsed for bind+function (size=16): 24.180000 (sec) Time elapsed for pure function invocation: 2.590000 (sec) Time elapsed for bind+function+pool: 39.210000 (sec) Time elapsed for bind+function+fastpool: 21.280000 (sec)