... Thankfully, I've had encouraging responses on the developer group, and the code is now posted there.
Matthew,
Because the discussion of how to achieve competitive boost::shared_ptr performance moved to the developers list, it may have left readers of the users list with misimpressions about performance. Perhaps you could post a summary of your newer results with the suggestions from the developers list applied.
Sure thing. Everyone, this is the body of the post from the .devel group. I'm not reposting the zip. You can pick that up from the other group. Cheers Matthew "
Would it be possible for you to share the source of the benchmarks?
Certainly. What would you advise: would it be ok to just send to you via email, or do you want me to post the whole hideous heap here? There is a certain amount of swill to be had.
1. There's a lot of STLSoft stuff in there, for the timings, and whatnot. (I don't think that's swill, of course, but still there's a fair amount of stuff needed) 2. There's a fair bit of very old, and *very* swillsome, Synesis code. If I let you have it, you must promise not to consider it representative of anything other than another lifetime, where both programmers and compilers were a lot less intelligent.
Well. I've been thinking about including the tests in Boost, in libs/smart_ptr/test, where currently shared_ptr_timing_test.cpp, shared_ptr_alloc_test.cpp and shared_ptr_mt_test.cpp reside.
Not clear. Do you mean my tests? That would be fine.
When benchmarking it is common courtesy to make the source publicly available so that the results can be reproduced by others.
If you have the time, it would probably be best if you could try the Boost tests first, to see if the results are consistent with your own. When
Let me make it clear, this wasn't an exercise set out for Boost bashing. I was simply investigating whether there were significant performance differences between internal and external reference-counting. I sought to use boost::shared_ptr because it's popular. Naturally I understand the need to share the source but, as an infrequent user of this ng, I did not want to be presumptuous and post when that might not be "the done thing". Furthermore, the results were initially so bad that I did not trust them, and saw no point in making myself look a fool and distracting others. Your advice on the quick_allocator and the new shared_count has actually pretty much done the trick, and shared_ptr is now pretty much on a par with the rest. (See end of post for final results) there
is a difference, we should probably use one of the Boost tests as a threading/timing framework and just replace the scenario being tested, to avoid the Synesis/STLSoft dependencies.
That just comes down to time, and at the moment I have precious little of it. I've put in a conditional compilation to omit the Synesis stuff, but I use the WinSTL/UNIX performance_counter classes, and getting rid of them would mean I'd have to plug in a lot of custom code to measure elapsed and thread-times, and I don't have the time at the moment. Hey, maybe Boost would be interested in having the performance_counters? There is a UNIX one as well, and I'm sure it would be trivial for your other-OS-experts to add the requisite ones for their platforms of choice. I'm posting a zip with the source files, an Intel makefile (I use Borland make, but I'm pretty sure any other make will do), and a minimum isolated set of STLSoft files needed to build them. To build using the enhancements, define the make symbol USE_BOOST_DIMOV_MEASURES, which causes the new shared_count.hpp to be included, and also defines BOOST_SP_USE_QUICK_ALLOCATOR to the compiler. Included below are the final results, without the Synesis tests, run on my machine, with and without USE_BOOST_DIMOV_MEASURES, for 100,000 and 1,000,000 iterations. It's clear that the "measures" address the stark performance disparities in multi-threaded builds, and genuine multi-threaded processes, at a small cost in single-threaded builds. No doubt that could be handled with suitable context discrimination. I hope that's enough information. Cheers Matthew Without Dimov Measures ====================== shared_ptr_test: Intel C/C++ - discarding pointers - single-threaded 100000 iterations Ext RC (boost::shared_ptr<Thing>): 139 Ext RC (SharedPtr<Thing>): 127 Ext RC (SharedPtr<Thing> + pool): 96 Ext RC (SharedPtr<Thing> + pool2): 88 shared_ptr_test: Intel C/C++ - saving pointers - single-threaded 100000 iterations Ext RC (boost::shared_ptr<Thing>): 269 Ext RC (SharedPtr<Thing>): 226 Ext RC (SharedPtr<Thing> + pool): 228 Ext RC (SharedPtr<Thing> + pool2): 225 shared_ptr_test: Intel C/C++ - discarding pointers - multi-threaded 100000 iterations Ext RC (boost::shared_ptr<Thing>): 410 Ext RC (SharedPtr<Thing>): 245 Ext RC (SharedPtr<Thing> + pool): 216 Ext RC (SharedPtr<Thing> + pool2): 221 shared_ptr_test: Intel C/C++ - saving pointers - multi-threaded 100000 iterations Ext RC (boost::shared_ptr<Thing>): 713 Ext RC (SharedPtr<Thing>): 506 Ext RC (SharedPtr<Thing> + pool): 519 Ext RC (SharedPtr<Thing> + pool2): 595 shared_ptr_thread_test: Intel C/C++ elapsed thread Ext RC (boost::shared_ptr<Thing>): 7441 2196 Ext RC (SharedPtr<Thing>): 895 785 Ext RC (SharedPtr<Thing> + pool): 1190 709 Ext RC (SharedPtr<Thing> + pool2): 2480 1017 shared_ptr_test: Intel C/C++ - discarding pointers - single-threaded 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 1427 Ext RC (SharedPtr<Thing>): 1279 Ext RC (SharedPtr<Thing> + pool): 947 Ext RC (SharedPtr<Thing> + pool2): 884 shared_ptr_test: Intel C/C++ - saving pointers - single-threaded 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 2545 Ext RC (SharedPtr<Thing>): 2292 Ext RC (SharedPtr<Thing> + pool): 2321 Ext RC (SharedPtr<Thing> + pool2): 2252 shared_ptr_test: Intel C/C++ - discarding pointers - multi-threaded 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 4159 Ext RC (SharedPtr<Thing>): 2452 Ext RC (SharedPtr<Thing> + pool): 2200 Ext RC (SharedPtr<Thing> + pool2): 2239 shared_ptr_test: Intel C/C++ - saving pointers - multi-threaded 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 7427 Ext RC (SharedPtr<Thing>): 5107 Ext RC (SharedPtr<Thing> + pool): 5243 Ext RC (SharedPtr<Thing> + pool2): 5373 shared_ptr_thread_test: Intel C/C++ elapsed thread Ext RC (boost::shared_ptr<Thing>): 302511 21712 Ext RC (SharedPtr<Thing>): 63595 7632 Ext RC (SharedPtr<Thing> + pool): 46783 7942 Ext RC (SharedPtr<Thing> + pool2): 158324 10130 With Dimov Measures =================== shared_ptr_test: Intel C/C++ - discarding pointers - single-threaded (+ boost quick allocator) 100000 iterations Ext RC (boost::shared_ptr<Thing>): 191 Ext RC (SharedPtr<Thing>): 128 Ext RC (SharedPtr<Thing> + pool): 94 Ext RC (SharedPtr<Thing> + pool2): 89 shared_ptr_test: Intel C/C++ - saving pointers - single-threaded (+ boost quick allocator) 100000 iterations Ext RC (boost::shared_ptr<Thing>): 276 Ext RC (SharedPtr<Thing>): 206 Ext RC (SharedPtr<Thing> + pool): 209 Ext RC (SharedPtr<Thing> + pool2): 204 shared_ptr_test: Intel C/C++ - discarding pointers - multi-threaded (+ boost quick allocator) 100000 iterations Ext RC (boost::shared_ptr<Thing>): 236 Ext RC (SharedPtr<Thing>): 244 Ext RC (SharedPtr<Thing> + pool): 215 Ext RC (SharedPtr<Thing> + pool2): 222 shared_ptr_test: Intel C/C++ - saving pointers - multi-threaded (+ boost quick allocator) 100000 iterations Ext RC (boost::shared_ptr<Thing>): 410 Ext RC (SharedPtr<Thing>): 483 Ext RC (SharedPtr<Thing> + pool): 498 Ext RC (SharedPtr<Thing> + pool2): 573 shared_ptr_thread_test: Intel C/C++ elapsed thread Ext RC (boost::shared_ptr<Thing>): 2035 876 Ext RC (SharedPtr<Thing>): 1197 800 Ext RC (SharedPtr<Thing> + pool): 1133 800 Ext RC (SharedPtr<Thing> + pool2): 4222 1066 shared_ptr_test: Intel C/C++ - discarding pointers - single-threaded (+ boost quick allocator) 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 1924 Ext RC (SharedPtr<Thing>): 1284 Ext RC (SharedPtr<Thing> + pool): 930 Ext RC (SharedPtr<Thing> + pool2): 863 shared_ptr_test: Intel C/C++ - saving pointers - single-threaded (+ boost quick allocator) 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 2817 Ext RC (SharedPtr<Thing>): 2112 Ext RC (SharedPtr<Thing> + pool): 2161 Ext RC (SharedPtr<Thing> + pool2): 2100 shared_ptr_test: Intel C/C++ - discarding pointers - multi-threaded (+ boost quick allocator) 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 2394 Ext RC (SharedPtr<Thing>): 2475 Ext RC (SharedPtr<Thing> + pool): 2180 Ext RC (SharedPtr<Thing> + pool2): 2249 shared_ptr_test: Intel C/C++ - saving pointers - multi-threaded (+ boost quick allocator) 1000000 iterations Ext RC (boost::shared_ptr<Thing>): 4181 Ext RC (SharedPtr<Thing>): 4925 Ext RC (SharedPtr<Thing> + pool): 5087 Ext RC (SharedPtr<Thing> + pool2): 5205 shared_ptr_thread_test: Intel C/C++ elapsed thread Ext RC (boost::shared_ptr<Thing>): 48751 8990 Ext RC (SharedPtr<Thing>): 170580 8038 Ext RC (SharedPtr<Thing> + pool): 59856 7974 Ext RC (SharedPtr<Thing> + pool2): 172761 10163 "