
I profiled speed_test.C with quantify to determine where the signals library was spending its time. On my SPARC/Solaris system with gcc 3.3.2 it was spending about 1/3 of its time in malloc, mostly as the result of the cache construction inside of slot_call_iterator. At home I decided to try replacing the shared_ptr in slot_call_iterator with an instance variable of the result_type. Since slot_call_iterator uses the shared_ptr to determine if the cache is valid, I had to add a bool indicating if the cached value is valid or not. These changes made the benchmark run about two to four times faster (on my home system, a 650 MHz Duron with gcc 3.3.2 running Debian). I expect better results with my SPARC machine, because the malloc implementation seems to be slower. There are several drawbacks to my modifications. It's harder to maintain because of the added bool. The result_type used by slot_call_iterator must now have a default constructor. If it is expensive to copy the result_type, and slot_call_iterator is copied a lot, replacing the shared_ptr with an instance variable will actually make things slower. I don't know enough about the internals to weigh how important these issues are. I believe the correct way to do things is to create a cache interface that encapsulates the behavior the slot_call_iterator needs, and to then choose the appropriate cache implementation at runtime using the mpl. I've attached both a diff for my changes to slot_call_iterator.hpp and the modified file. I'd be interested in knowing how it changes the performance on other platforms. Current performance on my 650 MHz Duron ===== 1000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 1000 0.0073 0.0002 10 100 0.0025 0.0001 50 20 0.0023 0.0001 100 10 0.0021 0.0001 250 4 0.0021 0.0001 500 2 0.0024 0.0001 1000 1 0.0026 0.0002 ===== 10000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 10000 0.0680 0.0022 10 1000 0.0258 0.0007 50 200 0.0214 0.0006 100 100 0.0209 0.0006 250 40 0.0209 0.0006 500 20 0.0218 0.0008 1000 10 0.0246 0.0008 5000 2 0.0254 0.0025 10000 1 0.0257 0.0027 ===== 100000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 100000 0.7370 0.0224 10 10000 0.2573 0.0090 50 2000 0.2284 0.0067 100 1000 0.2691 0.0072 250 400 0.2111 0.0069 500 200 0.2202 0.0094 1000 100 0.2635 0.0259 5000 20 0.2684 0.0318 10000 10 0.2749 0.0330 50000 2 0.2629 0.0266 100000 1 0.2638 0.0290 ===== 1000000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 1000000 6.9143 0.2232 10 100000 2.5620 0.0752 50 20000 2.2086 0.0669 100 10000 2.1817 0.0724 250 4000 2.1839 0.0894 500 2000 2.1643 0.1066 1000 1000 2.6981 0.3238 5000 200 2.7720 0.3870 10000 100 2.7220 0.3980 50000 20 2.7763 0.3479 100000 10 2.8006 0.3774 500000 2 2.6703 0.2991 slot_call_iterator with an instance variable instead of a shared_ptr performance on my 650MHz Duron. ===== 1000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 1000 0.0014 0.0002 10 100 0.0004 0.0001 50 20 0.0003 0.0001 100 10 0.0406 0.0001 250 4 0.0003 0.0001 500 2 0.0004 0.0001 1000 1 0.0007 0.0002 ===== 10000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 10000 0.0145 0.0022 10 1000 0.0039 0.0007 50 200 0.0030 0.0006 100 100 0.0029 0.0006 250 40 0.0029 0.0006 500 20 0.0033 0.0008 1000 10 0.0066 0.0009 5000 2 0.0073 0.0025 10000 1 0.0076 0.0029 ===== 100000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 100000 0.1446 0.0219 10 10000 0.0385 0.0075 50 2000 0.0302 0.0066 100 1000 0.0288 0.0065 250 400 0.0296 0.0068 500 200 0.0344 0.0083 1000 100 0.0949 0.0257 5000 20 0.0844 0.0318 10000 10 0.0819 0.0329 50000 2 0.0825 0.0294 100000 1 0.0741 0.0282 ===== 1000000 Total Calls ===== Num Slots Calls/Slot Boost Lite --------- ---------- ------- ------- 1 1000000 1.5345 0.2376 10 100000 0.6345 0.0758 50 20000 0.6562 0.2402 100 10000 0.2960 0.0686 250 4000 0.3477 0.0932 500 2000 0.5850 0.1311 1000 1000 1.7935 0.3861 5000 200 1.0274 0.4182 10000 100 1.1677 0.4073 50000 20 1.1002 0.7818 100000 10 1.5208 0.4197 500000 2 0.7921 0.2938 http://home.earthlink.net/~rzeh Robert Zeh