
Hi, I've just tried to measure performance of Boost.Function and Boost.Signals using the following simple program: #include <boost/function.hpp> #include <boost/signals.hpp> using namespace std; void do_nothing() {} int main() { boost::function<void (void)> f = do_nothing; void (*pf)(void) = &do_nothing; boost::signal<void (void)> s; s.connect(&do_nothing); for(int i = 0; i < 10000000; ++i) s(); return 0; } and trying calls to 's()', 'pf()' and 'f()'. The timings are, on an Athlon 1700 with g++ 3.3.5 and -O3: Function pointer: 0.05 secs Boost.Function: 0.2 secs Boost.Signals: 9.8 secs Boost.Function is 4 times slower than function pointer, which seems decent for me (not that I mind further improvement). But Boost.Signals is nearly 50 times slower that Boost.Function, and nearly 200 times slower than function pointer. Any chance this will be improved? Say, I don't care about combining the results of signals invocation, or any connection tracking -- I just want to use boost::signals as more convenient variant of vector<boost::function>. Is it possible to have some preprocessor define that throws away all advanced functionality? Or a base class, say "lightweight_signal" that has only basics? TIA, Volodya

Hello, Have modified your example a bit and tried it out on my linux machine (fedora core 3) and the performance seems to be even worse. I use boost 1.32 (1.32.0-5.fc3) and gcc-Version 3.4.4 20050721 (Red Hat 3.4.4-2) on a Intel Pentium 4 CPU 3.00GHz with 2048 kb cache. # When I compile it with optimization I typically get these values:
g++ -O3 -o signal_test main.cc -lboost_signals ./signal_test
Iterations: 10000000 SIGNALS: sec.: 56 / usec.: 295077 BOOST::FUNCTION: sec.: 0 / usec.: 176564 FUNCTION POINTER: sec.: 0 / usec.: 41502 # Without optimization it looks like this:
g++ -o signal_test main.cc -lboost_signals
./signal_test Iterations: 10000000 SIGNALS: sec.: 70 / usec.: 26187 BOOST::FUNCTION: sec.: 0 / usec.: 244619 FUNCTION POINTER: sec.: 0 / usec.: 53574 And I always thought signals ought to be fast :-). Here is the code I used for the test: ------------------------------- snip ------------------------------- #include <boost/function.hpp> #include <boost/signals.hpp> #include <iostream> #include <sys/time.h> using namespace std; void do_nothing() { } int main() { unsigned int counter = 0; const unsigned int MAX_ITERATION = 10000000; struct timeval t_start; struct timeval t_end; boost::function<void (void)> f = do_nothing; void (*pf)(void) = &do_nothing; boost::signal<void (void)> s; s.connect(&do_nothing); cerr << "\nIterations: " << MAX_ITERATION << endl << endl; cerr << "SIGNALS: "; gettimeofday( &t_start, 0 ); for( counter = 0; counter < MAX_ITERATION; ++counter ) { s(); } gettimeofday( &t_end, 0 ); cerr << "sec.: " << ( t_end.tv_sec - t_start.tv_sec ) << " / usec.: " << ( t_end.tv_usec - t_start.tv_usec ) << endl; cerr << "BOOST::FUNCTION: "; gettimeofday( &t_start, 0 ); for( counter = 0; counter < MAX_ITERATION; ++counter ) { f(); } gettimeofday( &t_end, 0 ); cerr << "sec.: " << ( t_end.tv_sec - t_start.tv_sec ) << " / usec.: " << ( t_end.tv_usec - t_start.tv_usec ) << endl; cerr << "FUNCTION POINTER: "; gettimeofday( &t_start, 0 ); for( counter = 0; counter < MAX_ITERATION; ++counter ) { pf(); } gettimeofday( &t_end, 0 ); cerr << "sec.: " << ( t_end.tv_sec - t_start.tv_sec ) << " / usec.: " << ( t_end.tv_usec - t_start.tv_usec ) << endl; return 0; } ------------------------------- snip ------------------------------- regards, Arno

The bulk of the time seems to be spent in named_slot_map::iterator construction and equals(). In both of those cases, nested calls to std::find_if are searching other collections for things that are is_callable()... I see that some bookkeeping is required for handling groups and disabled slots, etc., but it's not clear to me why find_if() needs to be called just to compare two iterator values, or to construct an iterator... Dave

On Dec 21, 2005, at 6:43 AM, Dave Moore wrote:
The bulk of the time seems to be spent in named_slot_map::iterator construction and equals().
In both of those cases, nested calls to std::find_if are searching other collections for things that are is_callable()...
I see that some bookkeeping is required for handling groups and disabled slots, etc., but it's not clear to me why find_if() needs to be called just to compare two iterator values, or to construct an iterator...
Because you need to skip over any disabled or removed slots to properly compare two iterators. For instance, let's say that X is a callable slot and O is not. You might have a slot list that looks like this: O O O O O O O O O O Now, you grab iterators "first" to the beginning of the sequence and "last" to the end of the sequence. If you just compare first and last directly, you'll think there are callable slots in the sequence... but there aren't! So, when you initialize first, you need to move it forward until you hit an X. In this case, you hit the end of the list first. Now imagine we have this: O O X X X X X X X X We skip over the first two O's, then hit an X. We call that slot, but it decides to disconnect all of the remaining slots: O O O O O O O O O O Now when we compare "first != last" again, we need to move "first" to the end of the list to determine that there are no more callable slots. See, for instance, the filter_iterator adaptor from the iterator adaptors library. Signals is much slower than vector<function<...> > because it has to do a lot more work than vector<function<...> >. Much of the problem comes from the fact that one can disconnect slots from a signal while that signal is invoking slots. A comparison against something like list<pair<function<...>, bool> >, where the bool tells us if we should skip the slot or not, would give us a much better bound on the performance that could be achieved by Signals. That said, named_slot_map::iterator is horrible for performance. It was built that way to decrease the object-file size for programs using Signals, named slots are just plain expensive. I think we can build a data structure that allows the implementation of the slot list to be similar to list<pair<function<...>, bool> >, with the mappings from named slots to position in the list stored in a separate data structure... but nobody has implemented this yet. My attempts to kill named slots outright, thus improving the performance of Signals, have met much resistance. We have to be realistic about the features that have been omitted in the simple, fast models against which Signals is being compared. For instance, the example program under discussion ignores several factors: 1) function<>s/function pointers never call more than one target, so they have no internal iteration constructs. Any Signals implementation must have these iteration constructs (so we need to at least compare against a vector of function<>s or function pointers) 2) target function objects can modify the iteration sequence during iteration. This results in additional iteration overhead that doesn't show up in the simple models, e.g., because one can add or remove slots while iterating through the list. 3) named slots induce an ordering on slot invocations. This may or may not effect iteration overhead, but there's some non-trivial data structure code behind here that must be considered. Would you make iteration 2x faster if it meant that slot (dis)connection was linear time instead of logarithmic or constant time? I'm not saying that it isn't useful to compare the performance of Signals against simpler models, but we need to compare apples to apples if the comparison is going to help us improve. Doug

Doug Gregor wrote:
We have to be realistic about the features that have been omitted in the simple, fast models against which Signals is being compared. For instance, the example program under discussion ignores several factors: 1) function<>s/function pointers never call more than one target, so they have no internal iteration constructs. Any Signals implementation must have these iteration constructs (so we need to at least compare against a vector of function<>s or function pointers)
Ok, let me try a comparison against vector of function pointers and vectors of boost::function. Test case attached. And I have another test cast, also attached, which tests signals performance in Qt. Results: function pointer 0.1 sec Boost::function: 0.31 sec Boost::signals 9.5 sec Qt 1.3 sec Boost.Signals is 30 times slower then Boost.function and 95 times slower than function and 7 times slower than Qt.
2) target function objects can modify the iteration sequence during iteration. This results in additional iteration overhead that doesn't show up in the simple models, e.g., because one can add or remove slots while iterating through the list.
What is the use case for that? And BTW, what is the use case for temporary blocking specific connection?
I'm not saying that it isn't useful to compare the performance of Signals against simpler models, but we need to compare apples to apples if the comparison is going to help us improve.
If by "apples to apples" you mean comparison to other signal implementation, then: 1. See comparison with Qt above 2. I don't think it's right anyway. Boost::signals has extra features compares to Boost::function and it's fair to estimate the price those features have compared to Boost::function . Still, what do you think about some simplified boost::signal (maybe a base class, or a #ifdefed) that will have better performance? - Volodya

On Dec 22, 2005, at 3:26 AM, Vladimir Prus wrote:
Doug Gregor wrote:
We have to be realistic about the features that have been omitted in the simple, fast models against which Signals is being compared. For instance, the example program under discussion ignores several factors: 1) function<>s/function pointers never call more than one target, so they have no internal iteration constructs. Any Signals implementation must have these iteration constructs (so we need to at least compare against a vector of function<>s or function pointers)
Ok, let me try a comparison against vector of function pointers and vectors of boost::function. Test case attached. And I have another test cast, also attached, which tests signals performance in Qt.
Results: function pointer 0.1 sec Boost::function: 0.31 sec Boost::signals 9.5 sec Qt 1.3 sec
Boost.Signals is 30 times slower then Boost.function and 95 times slower than function
So it was 50 times slower than a single Boost.function, 30 times slower than a single Boost.function in a vector.
and 7 times slower than Qt.
It's important for us to improve this.
2) target function objects can modify the iteration sequence during iteration. This results in additional iteration overhead that doesn't show up in the simple models, e.g., because one can add or remove slots while iterating through the list.
What is the use case for that?
It's especially good for one-shot slots: connect the slot, then once it fires it disconnects itself. It also happens when slots cause one of the objects connected to them to essentially "delete this", as would happen if your slot involved destroying a window within a GUI. This happens surprisingly often.
And BTW, what is the use case for temporary blocking specific connection?
It was a persistent user request followed by a patch from a user. I don't recall the use cases, but I expect it has something to do with temporarily disabling calls into a component while that component is reinitializing itself.
I'm not saying that it isn't useful to compare the performance of Signals against simpler models, but we need to compare apples to apples if the comparison is going to help us improve.
If by "apples to apples" you mean comparison to other signal implementation, then: 1. See comparison with Qt above 2. I don't think it's right anyway. Boost::signals has extra features compares to Boost::function and it's fair to estimate the price those features have compared to Boost::function .
Sure, but we have to know what features we're measuring. There's no signal implementation that consists of a single function pointer, so while it does provide a lower bound on the call time for a signal, that bound doesn't tell us anything. vector<function<> > is a reasonable baseline: the we add features such as connect/disconnect stability during invocation, slot blocking/unblocking, and named slots and see where that gets us. My guess? The combination of named slots and my attempts to avoid code bloat are killing signals invocation performance.
Still, what do you think about some simplified boost::signal (maybe a base class, or a #ifdefed) that will have better performance?
For I long time I've wanted to have a signal that doesn't permit named slots, so that we can replace the horrendous map<name, list<slot> > data structure in named_slot_map into a simple list<slot>. It even fits into the interface well: we would just add a partial specialization of signal<> like the following that uses the simple, faster list<slot>-type interface. template<typename Signature, typename Combiner, typename GroupCompare, typename SlotFunction> class signal<Signature, Combiner, void, GroupCompare, SlotFunction>; It may also be possible to have a common core of list<slot> and when Group != void have an auxiliary map from slot group names to iterators into the list. This tends to make disconnection rather tricky, however. Doug

Douglas Gregor wrote:
Ok, let me try a comparison against vector of function pointers and vectors of boost::function. Test case attached. And I have another test cast, also attached, which tests signals performance in Qt.
Results: function pointer 0.1 sec Boost::function: 0.31 sec Boost::signals 9.5 sec Qt 1.3 sec
Boost.Signals is 30 times slower then Boost.function and 95 times slower than function
So it was 50 times slower than a single Boost.function, 30 times slower than a single Boost.function in a vector.
Precisely. The performance difference between Boost.function and vector<Boost.Function> is a bit strange itself, as vector iteration should become a simple loop, but it's a tiny compared to boost.signals.
2) target function objects can modify the iteration sequence during iteration. This results in additional iteration overhead that doesn't show up in the simple models, e.g., because one can add or remove slots while iterating through the list.
What is the use case for that?
It's especially good for one-shot slots: connect the slot, then once it fires it disconnects itself.
If it disconnects *itself*, then it might be possible to defer removing such slots from whatever list they are in, until the iteration is done.
It also happens when slots cause one of the objects connected to them to essentially "delete this", as would happen if your slot involved destroying a window within a GUI. This happens surprisingly often.
Again, this means that a slot disconnects *itself*. It does not change the traversal for other slots. So this might be deferred.
And BTW, what is the use case for temporary blocking specific connection?
It was a persistent user request followed by a patch from a user. I don't recall the use cases, but I expect it has something to do with temporarily disabling calls into a component while that component is reinitializing itself.
Hmm, isn't this better handled on the component side, by adding if (reinializing) to all handlers? I don't see why temporary disabling all connections (which you need to store somehow) is better then adding "ifs". Or, it's possible to use an adapter class -- a pair of boost::function and bool that has operator() and will call the contained boost::function is bool member is true. That would allow to solve the original problem outside boost.signals, and not impose the overhead on everyone.
I'm not saying that it isn't useful to compare the performance of Signals against simpler models, but we need to compare apples to apples if the comparison is going to help us improve.
If by "apples to apples" you mean comparison to other signal implementation, then: 1. See comparison with Qt above 2. I don't think it's right anyway. Boost::signals has extra features compares to Boost::function and it's fair to estimate the price those features have compared to Boost::function .
Sure, but we have to know what features we're measuring. There's no signal implementation that consists of a single function pointer, so while it does provide a lower bound on the call time for a signal, that bound doesn't tell us anything. vector<function<> > is a reasonable baseline: the we add features such as connect/disconnect stability during invocation, slot blocking/unblocking, and named slots and see where that gets us. My guess? The combination of named slots and my attempts to avoid code bloat are killing signals invocation performance.
BTW, what are those named slots? I've looked into the docs several times but still did not figired that out.
Still, what do you think about some simplified boost::signal (maybe a base class, or a #ifdefed) that will have better performance?
For I long time I've wanted to have a signal that doesn't permit named slots, so that we can replace the horrendous map<name, list<slot> > data structure in named_slot_map into a simple list<slot>. It even fits into the interface well: we would just add a partial specialization of signal<> like the following that uses the simple, faster list<slot>-type interface.
template<typename Signature, typename Combiner, typename GroupCompare, typename SlotFunction> class signal<Signature, Combiner, void, GroupCompare, SlotFunction>;
That would be great. - Volodya

On Thu, 22 Dec 2005 09:32:45 -0500 Douglas Gregor <doug.gregor@gmail.com> wrote:
For I long time I've wanted to have a signal that doesn't permit named slots, so that we can replace the horrendous map<name, list<slot> > data structure in named_slot_map into a simple list<slot>. It even fits into the interface well: we would just add a
partial specialization of signal<> like the following that uses the simple, faster list<slot>-type interface.
Maybe what we need is a look at redesign that will provide all the features that are in current use, while allowing users to use smaller subsets of features. It's easy to think I'm just throwing stones, but I'm really no trying to do so. I think Boost.Signals is an incredible tool. However, it only gets marginal use in my stuff because of performance. I have, essentially, rolled my own mini classes (and in some cases used the FastDelegates lib), and only use Boost.Signals when I need all the whiz-bang features. If there were some other way...

Douglas Gregor wrote:
On Dec 22, 2005, at 3:26 AM, Vladimir Prus wrote: For I long time I've wanted to have a signal that doesn't permit named slots, so that we can replace the horrendous map<name, list<slot> > data structure in named_slot_map into a simple list<slot>. It even fits into the interface well: we would just add a partial specialization of signal<> like the following that uses the simple, faster list<slot>-type interface.
template<typename Signature, typename Combiner, typename GroupCompare, typename SlotFunction> class signal<Signature, Combiner, void, GroupCompare, SlotFunction>;
It may also be possible to have a common core of list<slot> and when Group != void have an auxiliary map from slot group names to iterators into the list. This tends to make disconnection rather tricky, however.
I do not know if this is doable, or even if you want to do it or consider it too much work, but I had suggested something like this a while back: a signal could be created, on an individual signal basis, with a parameter which says that it does not support named slots, and that this knowledge should allow you to use a much faster iterator to do signal processing once the signal is triggered or manipulated. Another possibility, along the same lines is what you have written just above: if no named slots have been added for a signal, which you can easily keep track of with an internal bool per signal, the internal code should theoretically be able to use simpler algorithms and data structures and thus speed up signal processing. In either case, those who use signals with named slots would have no reason to complain of slower signal processing, since it has already been determined that this slows down signals to a large extent and they would be told about it in the doc, while those who do not use named slots would benefit from faster processing. No, I am not volunteering to implement this in your excellent code, nor do I pretend to understand the internals of signal, although I have looked at it occasionally, but just suggesting a possibility along the lines of "what you pay for you get". I realize this greater flexibility would come at the price of greater internal complexity and more difficult maintainability if it could be done, and that is a headache you may not want to pursue.

On Thu, 22 Dec 2005 11:26:37 +0300 Vladimir Prus <ghost@cs.msu.su> wrote:
I'm not saying that it isn't useful to compare the performance of Signals against simpler models, but we need to compare apples to apples if the comparison is going to help us improve.
If by "apples to apples" you mean comparison to other signal implementation, then: 1. See comparison with Qt above 2. I don't think it's right anyway. Boost::signals has extra features compares to Boost::function and it's fair to estimate the price those features have compared to Boost::function .
Still, what do you think about some simplified boost::signal (maybe a base class, or a #ifdefed) that will have better performance?
To add (again), I think it is fair to compare boost::signal<> to boost::function<> because it gives you an idea of the extra cost. I have seen the arguments about the cost of extra functionality. However, if you are not using the extra functionality, then you should not have to pay for it. However, the Boost.Signals design/implementation requires the user to pay a performance penalty for features that MIGHT be used. This has always been my single most problem with the library. I should be able to instantiate a signal that gives me the level of functionality I desire. However, the current library mandates that simple users pay for complex and expensive features that are not used.

Signals is much slower than vector<function<...> > because it has to do a lot more work than vector<function<...> >. Much of the problem comes from the fact that one can disconnect slots from a signal while that signal is invoking slots. A comparison against something like list<pair<function<...>, bool> >, where the bool tells us if we should skip the slot or not, would give us a much better bound on the performance that could be achieved by Signals.
That said, named_slot_map::iterator is horrible for performance. It was built that way to decrease the object-file size for programs using Signals, named slots are just plain expensive. I think we can build a data structure that allows the implementation of the slot list to be similar to list<pair<function<...>, bool> >, with the mappings from named slots to position in the list stored in a separate data structure... but nobody has implemented this yet. My attempts to kill named slots outright, thus improving the performance of Signals, have met much resistance.
, seems like a good step in the right direction. I appreciate the overhead needed in providing clean disconnect semantics during signal callbacks... I'd just like to see the named_slots overhead only 'hit' when actually using
An approach where the slot list is more list list<pair<function<...>, bool> the feature. It seems like a good application of pay-for-what-you-use. Regards, Dave
participants (7)
-
Arno Wilhelm
-
Dave Moore
-
Doug Gregor
-
Douglas Gregor
-
Edward Diener
-
Jody Hagins
-
Vladimir Prus