Re: [boost] [function] function wrappingwithnoexceptionsafetyguarantee

9 Nov 2010

      "Daniel Walker" <daniel.j.walker@gmail.com> wrote in message
news:AANLkTinT6ofcXAi3TsBCDoDqLVgLn-sK4g0pV9pPOGu7@mail.gmail.com...
...
On Sat, Oct 30, 2010 at 1:25 PM, Domagoj Saric <dsaritz@gmail.com> wrote:
...
e.g. http://lists.boost.org/Archives/boost/2010/01/160908.php
Thanks. I added a tarball, signal_benchmark.tar.bz2, with a jamfile
and source code so that anyone who's interested can easily reproduce
this benchmark. The benchmark measures the impact of the static empty
scheme on the time per call of boost::signals2::signal using the code
Domagoj linked to. Thanks to Christophe Prud'homme for the original
benchmark!
Here are the results I got, again, using the build of g++ 4.2 provided
by my manufacturer.
Data (Release):
          |  function   | function (static empty)
time/call  |  3.54e-07s  |  3.51e-07s
space/type |    64B      |    80B
Data (Debug):
          |  function   | function (static empty)
time/call  |  2.05e-06s  |  2.04e-06s
space/type |    64B      |    80B
You can see that removing the empty check from boost::function yields
about a 1% improvement in time per call to boost::signal. The
increased space per type overhead is the same as before: 16B.
You just missed one important detail mentioned in the original post which is
to use a dummy mutex...
The fact that you were able to consistently measure _any_ difference (even
with your own simple modification) for something that should ideally be a
'simple' indirect call while 'surrounded' with all the dynamic memory
allocations, mutex locking, local shared_ptr/guard objects and other complex
internal signals2 logic... speaks volumes about the actual overhead at hand 
(for which you now
seem to want to claim as insignificant)...

You also misinterpreted the benchmark itself and used an incorrect 
'formula'/logic to count the number of boost::function invocations. Note 
that this is/was a boost::signals(2) benchmark and the number of 
boost::function invocations is not the same as the number of 
boost::signal(2) invocations...for example 25% of the time the benchmark 
code you posted is invoking a signal with no handler/boost::function 
assigned at all (that you take into account as boost::function 
invocations)...Even when the count part of the 'formula' is corrected, the 
end result 'name', 'average time per call', is still a misnomer as the 
calculation still/also includes signal creation, 'resizing' etc (which OTOH 
then also implictly benchmarks boost::function copy-construction and 
assignment, another sub-optimal area of the current implementation)...
The correct way to use and interpret the benchmark is exactly the way its 
original author did...to simply compare total times (for intermediate sizes 
or for the whole benchmark)...
Additionally the N chosen is IMO not large enough for the latest 
architectures (e.g. an i5@4+ GHz that also constantly dynamically adjusts 
its frequency) to achieve stable enough results...

The patch provided switches to a dummy mutex, adds two zeros to N, adjusts 
the benchmark's priority, corrects the number-of-calls calculation and skips 
the invocation of empty signals...

The differences in the 
'average-time-per-call-that-is-actually-something-else' number that the 
benchmark, patched and compiled with MSVC++ 10 (/Oxs), shows between
https://svn.boost.org/svn/boost/sandbox/function/boost/function
and
https://svn.boost.org/svn/boost/trunk/boost/function
are
Via C7-M ~6%
Intel i5 ~8%
AMD Athlon64 ~24%

(Yes, the Athlon number is correct, I measured it several times...if there 
is an AMD architecture expert lurking around here I'd love to hear his/hers 
thoughts about the result ;)
...
So, basically, in the use-case measured by this benchmark the time
overhead of boost::function is dwarfed by the combined costs of
boost::signal and the target function, and so using the static empty
scheme does not yield much benefit.
Even if it were 'dwarfed' (which it isn't) that result would still not imply 
that the change is somehow irrelevant or not worth doing (if there are no 
drawbacks to it, and there aren't, aiming specifically at your 
claims/'concerns' about static space size)...
Justifying inefficient code with the fact that there exists even slower code 
that wraps it is just downright wrong (even though so frequently done) as 
the same logic could then be used to justify just about any bad thing in the 
Universe since you can always find something worse...

-- 
"What Huxley teaches is that in the age of advanced technology, spiritual
devastation is more likely to come from an enemy with a smiling face than
from one whose countenance exudes suspicion and hate."
Neil Postman