
"Daniel Walker" <daniel.j.walker@gmail.com> wrote in message news:AANLkTinT6ofcXAi3TsBCDoDqLVgLn-sK4g0pV9pPOGu7@mail.gmail.com...
On Sat, Oct 30, 2010 at 1:25 PM, Domagoj Saric <dsaritz@gmail.com> wrote:
e.g. http://lists.boost.org/Archives/boost/2010/01/160908.php
Thanks. I added a tarball, signal_benchmark.tar.bz2, with a jamfile and source code so that anyone who's interested can easily reproduce this benchmark. The benchmark measures the impact of the static empty scheme on the time per call of boost::signals2::signal using the code Domagoj linked to. Thanks to Christophe Prud'homme for the original benchmark!
Here are the results I got, again, using the build of g++ 4.2 provided by my manufacturer.
Data (Release): | function | function (static empty) time/call | 3.54e-07s | 3.51e-07s space/type | 64B | 80B
Data (Debug): | function | function (static empty) time/call | 2.05e-06s | 2.04e-06s space/type | 64B | 80B
You can see that removing the empty check from boost::function yields about a 1% improvement in time per call to boost::signal. The increased space per type overhead is the same as before: 16B.
You just missed one important detail mentioned in the original post which is to use a dummy mutex... The fact that you were able to consistently measure _any_ difference (even with your own simple modification) for something that should ideally be a 'simple' indirect call while 'surrounded' with all the dynamic memory allocations, mutex locking, local shared_ptr/guard objects and other complex internal signals2 logic... speaks volumes about the actual overhead at hand (for which you now seem to want to claim as insignificant)... You also misinterpreted the benchmark itself and used an incorrect 'formula'/logic to count the number of boost::function invocations. Note that this is/was a boost::signals(2) benchmark and the number of boost::function invocations is not the same as the number of boost::signal(2) invocations...for example 25% of the time the benchmark code you posted is invoking a signal with no handler/boost::function assigned at all (that you take into account as boost::function invocations)...Even when the count part of the 'formula' is corrected, the end result 'name', 'average time per call', is still a misnomer as the calculation still/also includes signal creation, 'resizing' etc (which OTOH then also implictly benchmarks boost::function copy-construction and assignment, another sub-optimal area of the current implementation)... The correct way to use and interpret the benchmark is exactly the way its original author did...to simply compare total times (for intermediate sizes or for the whole benchmark)... Additionally the N chosen is IMO not large enough for the latest architectures (e.g. an i5@4+ GHz that also constantly dynamically adjusts its frequency) to achieve stable enough results... The patch provided switches to a dummy mutex, adds two zeros to N, adjusts the benchmark's priority, corrects the number-of-calls calculation and skips the invocation of empty signals... The differences in the 'average-time-per-call-that-is-actually-something-else' number that the benchmark, patched and compiled with MSVC++ 10 (/Oxs), shows between https://svn.boost.org/svn/boost/sandbox/function/boost/function and https://svn.boost.org/svn/boost/trunk/boost/function are Via C7-M ~6% Intel i5 ~8% AMD Athlon64 ~24% (Yes, the Athlon number is correct, I measured it several times...if there is an AMD architecture expert lurking around here I'd love to hear his/hers thoughts about the result ;)
So, basically, in the use-case measured by this benchmark the time overhead of boost::function is dwarfed by the combined costs of boost::signal and the target function, and so using the static empty scheme does not yield much benefit.
Even if it were 'dwarfed' (which it isn't) that result would still not imply that the change is somehow irrelevant or not worth doing (if there are no drawbacks to it, and there aren't, aiming specifically at your claims/'concerns' about static space size)... Justifying inefficient code with the fact that there exists even slower code that wraps it is just downright wrong (even though so frequently done) as the same logic could then be used to justify just about any bad thing in the Universe since you can always find something worse... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman