Hi, I see that there's already been some discussion concerning the performance of the signals library, it's fairly old though so I hope nobody minds me brining it up in a new thread. We use signals fairly extensively and have seen it coming in fairly high in our profiling sessions. I did a search on the mailing list and found a discussion from 2005 on the performance of the library, results at that time showed a vector with boost::function being about 100 times faster than boost::signals. Our tests shows that this still very much holds true. I suspect there difference is considerably worse on in order RISC processors. I think it would be nice if either the manual states that performance isn't one of the design goals of boost::signals, or that the interface allows the application programmer greater controll over implementation details which affects performance. I saw in one discussion going ( although once again, an old one ) that boost::signals use list as the container of slots instead of for example vector. This is understandable for the general case since otherwise connecting / disconnecting slots would be very expensive on large collections. However one use case which I'd think is very very frequent ( which resembles ours ), is to have fairly few slots connected and a somewhat constant collection, in this case one would rather have the expensive connection management and instead have the better cache locality and invocation performance. Blocking / unblocking slots is another feature which while useful for a lot of people perhaps should be optional. I don't know exactly how blocking / unblocking slots are implemented, but perhaps simply an added iterator which doesn't respect the blocked status could be easily added and a matching slot invocation function.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 22 July 2009, Sajjan Kalle wrote:
I did a search on the mailing list and found a discussion from 2005 on the performance of the library, results at that time showed a vector with boost::function being about 100 times faster than boost::signals.
You're going to have to provide a link.
Our tests shows that this still very much holds true.
You're going to have to provide a benchmark program. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkpnNnYACgkQ5vihyNWuA4VAJACgw4KYSenNFq5ophuNZWu551dG vOYAnAivKIeLchQhteuv2IsYa4ye1fL0 =z4eW -----END PGP SIGNATURE-----
2009/7/22 Frank Mori Hess
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I did a search on the mailing list and found a discussion from 2005 on the performance of the library, results at that time showed a vector with boost::function being about 100 times faster
On Wednesday 22 July 2009, Sajjan Kalle wrote: than
boost::signals.
You're going to have to provide a link.
Old one http://aspn.activestate.com/ASPN/Mail/Message/boost/2239573 . Found a more recent http://archives.free.net.ph/message/20080916.195431.40753f57.fr.html . I used the following: void foo( ) { } int main() { Altus::Timer tim; std::vector< boost::function< void ( void ) > > manualSignal; boost::signal< void ( void ) > boostSignal; for( unsigned int i = 0; i < 1000; ++i ) { manualSignal.push_back( &foo ); boostSignal.connect( &foo ); } double now = tim.GetTimeInMS(); for( unsigned int i = 0; i < 1000; ++i ) { for( unsigned int j = 0; j < 1000; ++j ) manualSignal[ i ]( ); } double diff = tim.GetTimeInMS() - now; std::cout << "took " << diff << " ms" << std::endl; now = tim.GetTimeInMS(); for( unsigned int i = 0; i < 1000; ++i ) { boostSignal( ); } diff = tim.GetTimeInMS() - now; std::cout << "took " << diff << " ms" << std::endl; } Which gives me the results ~5.2 ms for the vector variant and ~411 ms for the boost::signals variant. However, disabling checked iterators gives me ~3.8 ms for the vector variant and ~180 ms for boost::signals, so it seems to make a world of difference for the overhead associated with slots. We have more real world like scenario with our code base in which boost::signals seems to perform even worse, can't provide the entire code base however. I'd think this has to do with the fact that the slots becomes more fragmented in memory due to the underlying container when used in the wild. That's merely speculation however.
Which gives me the results ~5.2 ms for the vector variant and ~411 ms for the boost::signals variant. However, disabling checked iterators gives me ~3.8 ms for the vector variant and ~180 ms for boost::signals, so it seems to make a world of difference for the overhead associated with slots.
Did you switch optimizations on?
2009/7/23 Igor R
Which gives me the results ~5.2 ms for the vector variant and ~411 ms for the boost::signals variant. However, disabling checked iterators gives me ~3.8 ms for the vector variant and ~180 ms for boost::signals, so it seems to make a world of difference for the overhead associated with slots.
Did you switch optimizations on?
Yes, /O2 /Ob2 /Oi /Ot, all of the common optimization flags in favor of fast code. This is VS 2008 btw.
Hi, did you try the "Profile Guided Optimaziotn" stuff? Best Regards, Ingo Sajjan Kalle schrieb:
2009/7/23 Igor R
mailto:boost.lists@gmail.com> > Which gives me the results ~5.2 ms for the vector variant and ~411 ms for > the boost::signals variant. However, disabling checked iterators gives me > ~3.8 ms for the vector variant and ~180 ms for boost::signals, so it seems > to make a world of difference for the overhead associated with slots.
Did you switch optimizations on?
Yes, /O2 /Ob2 /Oi /Ot, all of the common optimization flags in favor of fast code. This is VS 2008 btw. ------------------------------------------------------------------------
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Did you switch optimizations on?
Yes, /O2 /Ob2 /Oi /Ot, all of the common optimization flags in favor of fast code. This is VS 2008 btw.
I've just noticed that you reference "old" boost.signals. Did you try the same with signals2 (with dummy_mutex)? It would be interesting to compare.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 23 July 2009, Sajjan Kalle wrote:
You're going to have to provide a link.
Your reply quoting in the plain text versions of your emails is completely broken. Please turn off whatever option in gmail that causes that.
Old one http://aspn.activestate.com/ASPN/Mail/Message/boost/2239573 .
That's a bit too old to be interesting.
Found a more recent http://archives.free.net.ph/message/20080916.195431.40753f57.fr.html .
That's looks like a "I didn't know I had to set _SECURE_SCL to zero for good performance" problem to me.
I used the following:
Altus::Timer tim;
Please provide a benchmark that will compile with just boost. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkpoZBwACgkQ5vihyNWuA4WOLACfX2CtUY0rQhktvYnCa/mcmgPL dSAAoKETrSmqCL+ffk/iypkfhNX1PnJd =lNjH -----END PGP SIGNATURE-----
2009/7/23 Frank Mori Hess frank.hess@nist.gov
Your reply quoting in the plain text versions of your emails is completely broken. Please turn off whatever option in gmail that causes that.
I think it should be fixed now.
That's a bit too old to be interesting.
Found a more recent http://archives.free.net.ph/message/20080916.195431.40753f57.fr.html .
That's looks like a "I didn't know I had to set _SECURE_SCL to zero for good performance" problem to me.
Yeah, upon further inspection that sample isn't really all that interesting.
I used the following:
Altus::Timer tim;
Please provide a benchmark that will compile with just boost.
I'm upgrading to 1.39 from 1.38, I'll create a new sample without the alien timer. Profiling the project in question where the performance of boost::signals is one of the problems it seems two of the bottlenecks is slot_call_iterator::increment and slot_call_iterator::equal, and in those in particular find_if, which seems to be used for incrementing the iterator. I'll try to create a sample with timing on a bit more fragmented world like use case.
2009/7/23 Sajjan Kalle
Profiling the project in question where the performance of boost::signals is one of the problems it seems two of the bottlenecks is slot_call_iterator::increment and slot_call_iterator::equal, and in those in particular find_if, which seems to be used for incrementing the iterator. I'll try to create a sample with timing on a bit more fragmented world like use case.
I'll throw signals2 in to the mix once I've built 1.39, in the meantime, here's a new variant which depends on only stl and boost. #include <iostream> #include "boost/signals.hpp" #include <vector> #include "boost/function.hpp" #include "boost/timer.hpp" #include <cstdlib> #include <algorithm> void foo( ) { } int main() { std::vector< boost::function< void ( void ) > > manualSignal; boost::signal< void ( void ) > boostSignalFragmented, boostSignalUnfragmented; typedef std::vector< boost::signals::connection > ConnectionVector; ConnectionVector connections; for( unsigned int i = 0; i < 10000; ++i ) { manualSignal.push_back( &foo ); boostSignalUnfragmented.connect( &foo ); } for( unsigned int i = 0; i < 100000; ++i ) { connections.push_back( boostSignalFragmented.connect( &foo ) ); } for( unsigned int i = 0; i < 90000; ++i ) { ConnectionVector::iterator index = connections.begin() + rand() % connections.size(); (*index).disconnect(); *index = *connections.rbegin(); connections.erase( connections.begin() + connections.size() - 1 ); } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { for( unsigned int j = 0; j < 10000; ++j ) manualSignal[ i ]( ); } double elapsed = tm.elapsed(); std::cout << "vector variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignalUnfragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal Unfragmented variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignalFragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal Fragmented variant: " << elapsed << std::endl; } } This gives me ~0.032 on vector, ~1.80 on unfragmented boost::signal and lastly ~3.04 on fragmented boost::signal.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 23 July 2009, Sajjan Kalle wrote:
2009/7/23 Sajjan Kalle
: Profiling the project in question where the performance of boost::signals is one of the problems it seems two of the bottlenecks is slot_call_iterator::increment and slot_call_iterator::equal, and in those in particular find_if, which seems to be used for incrementing the iterator. I'll try to create a sample with timing on a bit more fragmented world like use case.
I'll throw signals2 in to the mix once I've built 1.39, in the meantime, here's a new variant which depends on only stl and boost.
#include <iostream> #include "boost/signals.hpp" #include <vector> #include "boost/function.hpp" #include "boost/timer.hpp" #include <cstdlib> #include <algorithm>
void foo( ) { }
int main() { std::vector< boost::function< void ( void ) > > manualSignal; boost::signal< void ( void ) > boostSignalFragmented, boostSignalUnfragmented; typedef std::vector< boost::signals::connection > ConnectionVector; ConnectionVector connections; for( unsigned int i = 0; i < 10000; ++i ) { manualSignal.push_back( &foo ); boostSignalUnfragmented.connect( &foo ); } for( unsigned int i = 0; i < 100000; ++i ) { connections.push_back( boostSignalFragmented.connect( &foo ) ); } for( unsigned int i = 0; i < 90000; ++i ) { ConnectionVector::iterator index = connections.begin() + rand() % connections.size(); (*index).disconnect(); *index = *connections.rbegin(); connections.erase( connections.begin() + connections.size() - 1 ); } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { for( unsigned int j = 0; j < 10000; ++j ) manualSignal[ i ]( ); } double elapsed = tm.elapsed(); std::cout << "vector variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignalUnfragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal Unfragmented variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignalFragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal Fragmented variant: " << elapsed << std::endl; } }
This gives me ~0.032 on vector, ~1.80 on unfragmented boost::signal and lastly ~3.04 on fragmented boost::signal.
On what kind of hardware? On a 3.2GHz Pentium D running Linux with gcc 4.3.2 , I get: $ g++ -Wall -O2 -I /usr/local/boost-1_39_0/include/boost-1_39/ sigbench.cpp -lboost_signals-gcc43-mt $ ./a.out vector variant: 0.12 boost::signal Unfragmented variant: 0.89 boost::signal Fragmented variant: 2 $ ./a.out vector variant: 0.12 boost::signal Unfragmented variant: 0.52 boost::signal Fragmented variant: 2.07 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkpouYQACgkQ5vihyNWuA4VvzACfQBxHw5tzpoDCAiocQLgmUrNW eLkAmgLTvPVbHoDMeV28vg60UAQ9U8cC =f5c+ -----END PGP SIGNATURE-----
On what kind of hardware? On a 3.2GHz Pentium D running Linux with gcc 4.3.2 , I get:
$ g++ -Wall -O2 -I /usr/local/boost-1_39_0/include/boost-1_39/ sigbench.cpp -lboost_signals-gcc43-mt $ ./a.out vector variant: 0.12 boost::signal Unfragmented variant: 0.89 boost::signal Fragmented variant: 2 $ ./a.out vector variant: 0.12 boost::signal Unfragmented variant: 0.52 boost::signal Fragmented variant: 2.07
Ah, that's weird, I'm on a intel quad core 2.4 ghz running vista 32 bit, compiled with MSVC 2008. I'll rebuild with 1.39 tomorrow and do a run, with signals2 included.
Ah, that's weird, I'm on a intel quad core 2.4 ghz running vista 32 bit, compiled with MSVC 2008. I'll rebuild with 1.39 tomorrow and do a run, with signals2 included.
Rebuilt with boost 1.39 and signals2, getting results similar to yours now, so probably missed some optimization flag when building boost. Here's the source with signals2 included: #include <iostream> #include "boost/signals.hpp" #include <vector> #include "boost/function.hpp" #include "boost/timer.hpp" #include "boost/signals2/signal.hpp" #include <cstdlib> #include <algorithm> void foo( ) { } int main() { std::vector< boost::function< void ( void ) > > manualSignal; boost::signal< void ( void ) > boostSignalFragmented, boostSignalUnfragmented; boost::signals2::signal< void ( void ) > boostSignal2Fragmented, boostSignal2Unfragmented; typedef std::vector< boost::signals::connection > ConnectionVector; typedef std::vector< boost::signals2::connection > ConnectionVector2; ConnectionVector connections; ConnectionVector2 connections2; for( unsigned int i = 0; i < 10000; ++i ) { manualSignal.push_back( &foo ); boostSignal2Unfragmented.connect( &foo ); boostSignalUnfragmented.connect( &foo ); } for( unsigned int i = 0; i < 100000; ++i ) { connections.push_back( boostSignalFragmented.connect( &foo ) ); connections2.push_back( boostSignal2Fragmented.connect( &foo ) ); } for( unsigned int i = 0; i < 90000; ++i ) { { ConnectionVector::iterator index = connections.begin() + rand() % connections.size(); (*index).disconnect(); *index = *connections.rbegin(); connections.erase( connections.begin() + connections.size() - 1 ); } { ConnectionVector2::iterator index = connections2.begin() + rand() % connections2.size(); (*index).disconnect(); *index = *connections2.rbegin(); connections2.erase( connections2.begin() + connections2.size() - 1 ); } } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { for( unsigned int j = 0; j < 10000; ++j ) manualSignal[ i ]( ); } double elapsed = tm.elapsed(); std::cout << "vector variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignalUnfragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal Unfragmented variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignalFragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal Fragmented variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignal2Unfragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal2 Unfragmented variant: " << elapsed << std::endl; } { boost::timer tm; for( unsigned int i = 0; i < 1000; ++i ) { boostSignal2Fragmented( ); } double elapsed = tm.elapsed(); std::cout << "boost::signal2 Fragmented variant: " << elapsed << std::endl; } } This yields: Vector: 0.038 boost::signal unfragmented : 0.936 boost::signal fragmented: 1.659 boost::signal2 unfragmented: 1.092 boost::signal2 fragmented: 9.793 What's surprising here is the long running time of the fragmented signal2 variant, which makes me belive there's a bug somewhere in it. Instead of connecting 100000 slots and disconnect 90000 randomly I connected 1000000 and disconnected 990000, the fragmented signal2 performance on invocation then jumps up to about 92 seconds for calling a signal with 10000 slots. I think a lot of the disparities between signal and the vector variant actually is due to the underlying container used, a std::map will simply never beat the vector, and in fact signal will most likely perform considerably worse when slots go further and further apart and paging kicks in. I think the underlying container in signals is to much of a factor to keep private, and should, imo, be a paramater. A lot of people seems to use signal for event systems, where fragmentation becomes very natural if there's a lot of actors which enters / leaves the system. I understand that the code for connections would have to have special implementations depending on the traits of the underlying container type, but keeping the same public interface would still be very much possible without to much of an effort, I think.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 24 July 2009, Sajjan Kalle wrote:
What's surprising here is the long running time of the fragmented signal2 variant, which makes me belive there's a bug somewhere in it. Instead of connecting 100000 slots and disconnect 90000 randomly I connected 1000000 and disconnected 990000, the fragmented signal2 performance on invocation then jumps up to about 92 seconds for calling a signal with 10000 slots.
That's due to the way signals2 cleans up disconnected to its slot list (incrementally during connection and invocation). It only does a full cleanup when it has to do a deep copy of the slot list due to concurrent access. If you run the signals2 fragmented block 10 times in a row, the run time will decrease as the disconnected slots are removed. However, I should be able improve the situation by adding a bit of code to invocation so it checks for an excessive number of disconnected slots and forces a full cleanup if needed.
I think a lot of the disparities between signal and the vector variant actually is due to the underlying container used, a std::map will simply never beat the vector, and in fact signal will most likely perform considerably worse when slots go further and further apart and paging kicks in.
You should add a fragmented test using plain iteration over a std::list to
your benchmark. Also, you could try using a dummy_mutex for the signals2
signals, like:
namespace bs2 = boost::signals2;
using bs2::keywords;
bs2::signal_type
2009/7/24 Frank Mori Hess
You should add a fragmented test using plain iteration over a std::list to your benchmark. Also, you could try using a dummy_mutex for the signals2 signals, like:
namespace bs2 = boost::signals2; using bs2::keywords; bs2::signal_type
bs2::dummy_mutex >::type boostSignal2Fragmented, boostSignal2Unfragmented;
With dummy_mutex the unfragmented version of signals2 is about twice
as fast as the unfragmented signals version, finnishing at 0.514. Is
there any advantage of signals over signals2? Perhaps it should be
deprecated. Here's the new version, slightly changed in a few areas.
#include <iostream>
#include "boost/signals.hpp"
#include <vector>
#include "boost/function.hpp"
#include "boost/timer.hpp"
#include "boost/signals2/signal.hpp"
#include "boost/signals2/dummy_mutex.hpp"
#include "boost/signals2/signal_type.hpp"
#include <cstdlib>
#include <algorithm>
void foo( )
{
}
int main()
{
std::vector< boost::function< void ( void ) > > manualSignal;
typedef std::list< boost::function< void ( void ) > >::iterator
SigListIterator;
std::list< boost::function< void ( void ) > >
manualSignalListUnfragmented, manualSignalListFragmented;
boost::signal< void ( void ) > boostSignalFragmented, boostSignalUnfragmented;
namespace bs2 = boost::signals2;
bs2::signal_type
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 24 July 2009, Sajjan Kalle wrote:
I guess the fragmented signals2 version isn't all that interesting anymore
Changeset 55143 on the svn trunk: https://svn.boost.org/trac/boost/changeset/55143/trunk should improve the fragmented signals2 benchmark time. With the patch, your benchmark outputs for me: $ ./a.out vector variant: 0.13 boost::signal unfragmented variant: 0.51 boost::signal fragmented variant: 2.15 boost::signal2 unfragmented variant: 0.44 boost::signal2 Fragmented variant: 2.62 list unfragmented variant: 0.12 list fragmented variant: 1.01 On repeated runs with the list stuff removed (due to it taking so long), the boost::signal times show the most variability for whatever reason: $ ./a.out vector variant: 0.12 boost::signal unfragmented variant: 1.36 boost::signal fragmented variant: 3.04 boost::signal2 unfragmented variant: 0.45 boost::signal2 Fragmented variant: 2.6 $ ./a.out vector variant: 0.12 boost::signal unfragmented variant: 0.81 boost::signal fragmented variant: 2.85 boost::signal2 unfragmented variant: 0.44 boost::signal2 Fragmented variant: 2.6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkpp9k0ACgkQ5vihyNWuA4VfhwCgtoobVkkVJjX8GFf34Pq22la8 CSoAn0CPEyvW2THpkHVX9FAs2OJHfTrD =KgL7 -----END PGP SIGNATURE-----
participants (4)
-
Frank Mori Hess
-
Igor R
-
Ingo Maindorfer
-
Sajjan Kalle