[container] boost::container::vector, hybrid_vector, and performance

After some of the varray discussion and interest in hybrid_vector I modified an existing synthetic benchmark from the C++Next site (cpp-next.com) to test varray, boost::container::vector, boost::container::vector<T, stack_allocator>, and std::vector. Essentially for each container it times constructing a container of containers of primitives, std::sort, std::rotate, and destruction. Note that the test calls reserve() when setting up each container so that each gets a fair shot and isn't unfairly bogged down by a simple allocation optimization. The short results summary: varray was generally at or near the top for performance due to its simplicity, and interestingly on almost all platforms std::vector significantly outperformed boost::container::vector, which edged out boost::container::vector<T, stack_allocator>. Of course these are synthetic tests of only a subset of these containers’ functionality using only primitives so it should not be construed to necessarily represent the performance of these containers for other uses. Nonetheless I thought it may be interesting for discussion. There is also an additional benchmark of Adam Wulkiewicz’s rtree using linux clang comparing varray, container::vector, and std::vector with similar results. One question this leads me to ask is if someone might be able to illustrate a significant advantage hybrid_vector could have over boost::container::vector<T, stack_allocator>? The test can be found at: https://svn.boost.org/svn/boost/sandbox/varray/example/bench_varray.cpp Results are below. Cheers! Andrew Hundt --------------------------------------------------------------------------------------------------------------------------- Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn) Target: x86_64-apple-darwin12.2.0 Thread model: posix Processor 2.2 GHz Intel Core i7 Memory 16 GB 1333 MHz DDR3 Software OS X 10.8.2 (12C60) --------------------------------------------------------------------------------------------------------------------------- set stack size limit to: 17000000 bytes N = 1000 varray benchmark: construction took 10.576854s wall, 10.580000s user + 0.020000s system = 10.600000s CPU (100.2%) sort took 4.323139s wall, 4.290000s user + 0.010000s system = 4.300000s CPU (99.5%) rotate took 0.909033s wall, 0.910000s user + 0.000000s system = 0.910000s CPU (100.1%) destruction took 0.000725s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) done Total time = 15.824697s wall, 15.780000s user + 0.040000s system = 15.820000s CPU (100.0%) boost::container::vector benchmark construction took 11.656541s wall, 11.180000s user + 0.320000s system = 11.500000s CPU (98.7%) sort took 5.213782s wall, 4.940000s user + 0.000000s system = 4.940000s CPU (94.7%) rotate took 0.003201s wall, 0.010000s user + 0.000000s system = 0.010000s CPU (312.4%) destruction took 2.482177s wall, 0.640000s user + 2.000000s system = 2.640000s CPU (106.4%) done Total time = 19.367074s wall, 16.770000s user + 2.330000s system = 19.100000s CPU (98.6%) boost::container::vector benchmark with stack_allocator construction took 11.698265s wall, 11.110000s user + 0.430000s system = 11.540000s CPU (98.6%) sort took 5.227070s wall, 5.180000s user + 0.000000s system = 5.180000s CPU (99.1%) rotate took 0.003778s wall, 0.010000s user + 0.000000s system = 0.010000s CPU (264.7%) destruction took 2.493007s wall, 0.510000s user + 1.900000s system = 2.410000s CPU (96.7%) done Total time = 19.433571s wall, 16.810000s user + 2.340000s system = 19.150000s CPU (98.5%) std::vector benchmark construction took 11.605458s wall, 11.400000s user + 0.300000s system = 11.700000s CPU (100.8%) sort took 2.256151s wall, 2.250000s user + 0.000000s system = 2.250000s CPU (99.7%) rotate took 0.003200s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 2.498035s wall, 0.370000s user + 2.040000s system = 2.410000s CPU (96.5%) done Total time = 16.374640s wall, 14.020000s user + 2.350000s system = 16.370000s CPU (100.0%) varray/boost::container::vector total time comparison: wall = 0.817093 user = 0.940966 system = 0.0171674 (user+system) = 0.828272 varray/(boost::container::vector + stack_allocator) total time comparison: wall = 0.814297 user = 0.938727 system = 0.017094 (user+system) = 0.82611 varray/std::vector total time comparison: wall = 0.966415 user = 1.12553 system = 0.0170213 (user+system) = 0.966402 --------------------------------------------------------------------------------------------------------------------------- Windows VS2012 Core2Duo P7350 2GHz, 3GB Ram --------------------------------------------------------------------------------------------------------------------------- N = 1000 varray benchmark: construction took 18.901735s wall, 18.798121s user + 0.000000s system = 18.798121s CPU (99.5%) sort took 3.093771s wall, 3.260421s user + 0.000000s system = 3.260421s CPU (105.4%) rotate took 0.336938s wall, 0.280802s user + 0.000000s system = 0.280802s CPU (83.3%) destruction took 0.000270s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) done Total time = 22.339054s wall, 22.339343s user + 0.000000s system = 22.339343s CPU (100.0%) boost::container::vector benchmark construction took 18.757410s wall, 18.595319s user + 0.202801s system = 18.798121s CPU (100.2%) sort took 0.120896s wall, 0.140401s user + 0.000000s system = 0.140401s CPU (116.1%) rotate took 0.004482s wall, 0.015600s user + 0.000000s system = 0.015600s CPU (348.0%) destruction took 0.282280s wall, 0.046800s user + 0.171601s system = 0.218401s CPU (77.4%) done Total time = 19.167624s wall, 18.798121s user + 0.374402s system = 19.172523s CPU (100.0%) boost::container::vector benchmark with stack_allocator construction took 18.811076s wall, 18.361318s user + 0.312002s system = 18.673320s CPU (99.3%) sort took 0.134590s wall, 0.218401s user + 0.000000s system = 0.218401s CPU (162.3%) rotate took 0.004111s wall, 0.015600s user + 0.000000s system = 0.015600s CPU (379.5%) destruction took 0.279619s wall, 0.031200s user + 0.280802s system = 0.312002s CPU (111.6%) done Total time = 19.231975s wall, 18.626519s user + 0.592804s system = 19.219323s CPU (99.9%) std::vector benchmark construction took 19.587929s wall, 19.359724s user + 0.234002s system = 19.593726s CPU (100.0%) sort took 0.112920s wall, 0.124801s user + 0.000000s system = 0.124801s CPU (110.5%) rotate took 0.002486s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 0.262937s wall, 0.031200s user + 0.218401s system = 0.249602s CPU (94.9%) done Total time = 19.968799s wall, 19.515725s user + 0.452403s system = 19.968128s CPU (100.0%) varray/boost::container::vector total time comparison: wall = 1.16546 user = 1.18838 system = 0 (user+system) = 1.16517 varray/(boost::container::vector + stack_allocator) total time comparison: wall = 1.16156 user = 1.19933 system = 0 (user+system) = 1.16234 varray/std::vector total time comparison: wall = 1.1187 user = 1.14468 system = 0 (user+system) = 1.11875 --------------------------------------------------------------------------------------------------------------------------- Windows mingw_4_7 -O2 Core2Duo P7350 2GHz, 3GB Ram --------------------------------------------------------------------------------------------------------------------------- N = 1000 varray benchmark: construction took 13.752772s wall, 13.447286s user + 0.000000s system = 13.447286s CPU (97.8%) sort took 2.207612s wall, 2.449216s user + 0.000000s system = 2.449216s CPU (110.9%) rotate took 0.381684s wall, 0.452403s user + 0.000000s system = 0.452403s CPU (118.5%) destruction took 0.000265s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) done Total time = 16.348433s wall, 16.348905s user + 0.000000s system = 16.348905s CPU (100.0%) boost::container::vector benchmark construction took 16.736949s wall, 15.631300s user + 1.060807s system = 16.692107s CPU (99.7%) sort took 5.170353s wall, 5.226034s user + 0.000000s system = 5.226034s CPU (101.1%) rotate took 0.001871s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 0.537667s wall, 0.124801s user + 0.374402s system = 0.499203s CPU (92.8%) done Total time = 22.453639s wall, 20.982135s user + 1.435209s system = 22.417344s CPU (99.8%) boost::container::vector benchmark with stack_allocator construction took 16.227005s wall, 15.412899s user + 0.624004s system = 16.036903s CPU (98.8%) sort took 4.376050s wall, 4.477229s user + 0.000000s system = 4.477229s CPU (102.3%) rotate took 0.001871s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 0.523920s wall, 0.109201s user + 0.514803s system = 0.624004s CPU (119.1%) done Total time = 21.134513s wall, 19.999328s user + 1.138807s system = 21.138135s CPU (100.0%) std::vector benchmark construction took 15.182662s wall, 14.149291s user + 0.904806s system = 15.054096s CPU (99.2%) sort took 1.408413s wall, 1.528810s user + 0.000000s system = 1.528810s CPU (108.5%) rotate took 0.001668s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 0.451160s wall, 0.046800s user + 0.421203s system = 0.468003s CPU (103.7%) done Total time = 17.049415s wall, 15.724901s user + 1.326008s system = 17.050909s CPU (100.0%) varray/boost::container::vector total time comparison: wall = 0.728097 user = 0.779182 system = 0 (user+system) = 0.729297 varray/(boost::container::vector + stack_allocator) total time comparison: wall = 0.773542 user = 0.817473 system = 0 (user+system) = 0.773432 varray/std::vector total time comparison: wall = 0.958885 user = 1.03968 system = 0 (user+system) = 0.958829 --------------------------------------------------------------------------------------------------------------------------- Linux clang_3_0_6 -O2 Core2Duo P7350 2GHz, 3GB Ram --------------------------------------------------------------------------------------------------------------------------- set stack size limit to: 9000000 bytes N = 1000 varray benchmark: construction took 27.455347s wall, 27.360000s user + 0.050000s system = 27.410000s CPU (99.8%) sort took 5.683607s wall, 5.750000s user + 0.010000s system = 5.760000s CPU (101.3%) rotate took 1.651716s wall, 1.560000s user + 0.000000s system = 1.560000s CPU (94.4%) destruction took 0.002050s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) done Total time = 34.810039s wall, 34.700000s user + 0.060000s system = 34.760000s CPU (99.9%) boost::container::vector benchmark construction took 39.073566s wall, 33.070000s user + 5.960000s system = 39.030000s CPU (99.9%) sort took 13.405286s wall, 13.440000s user + 0.020000s system = 13.460000s CPU (100.4%) rotate took 0.011300s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 1.005573s wall, 0.810000s user + 0.150000s system = 0.960000s CPU (95.5%) done Total time = 53.509749s wall, 47.330000s user + 6.130000s system = 53.460000s CPU (99.9%) boost::container::vector benchmark with stack_allocator construction took 39.457245s wall, 33.660000s user + 5.790000s system = 39.450000s CPU (100.0%) sort took 13.454378s wall, 13.190000s user + 0.050000s system = 13.240000s CPU (98.4%) rotate took 0.013398s wall, 0.040000s user + 0.000000s system = 0.040000s CPU (298.5%) destruction took 1.007952s wall, 0.860000s user + 0.300000s system = 1.160000s CPU (115.1%) done Total time = 53.947026s wall, 47.750000s user + 6.140000s system = 53.890000s CPU (99.9%) std::vector benchmark construction took 31.662683s wall, 25.550000s user + 6.240000s system = 31.790000s CPU (100.4%) sort took 3.102917s wall, 2.870000s user + 0.080000s system = 2.950000s CPU (95.1%) rotate took 0.010094s wall, 0.020000s user + 0.000000s system = 0.020000s CPU (198.1%) destruction took 1.002616s wall, 0.770000s user + 0.210000s system = 0.980000s CPU (97.7%) done Total time = 35.793019s wall, 29.220000s user + 6.530000s system = 35.750000s CPU (99.9%) varray/boost::container::vector total time comparison: wall = 0.650536 user = 0.73315 system = 0.00978793 (user+system) = 0.650206 varray/(boost::container::vector + stack_allocator) total time comparison: wall = 0.645263 user = 0.726702 system = 0.00977199 (user+system) = 0.645018 varray/std::vector total time comparison: wall = 0.972537 user = 1.18754 system = 0.00918836 (user+system) = 0.972308 --------------------------------------------------------------------------------------------------------------------------- Linux gcc_4_7_2 -O2 Core2Duo P7350 2GHz, 3GB Ram --------------------------------------------------------------------------------------------------------------------------- set stack size limit to: 9000000 bytes N = 1000 varray benchmark: construction took 27.677216s wall, 27.560000s user + 0.050000s system = 27.610000s CPU (99.8%) sort took 5.934591s wall, 6.080000s user + 0.010000s system = 6.090000s CPU (102.6%) rotate took 1.720024s wall, 1.590000s user + 0.000000s system = 1.590000s CPU (92.4%) destruction took 0.002077s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) done Total time = 35.350653s wall, 35.240000s user + 0.060000s system = 35.300000s CPU (99.9%) boost::container::vector benchmark construction took 38.196097s wall, 32.530000s user + 5.830000s system = 38.360000s CPU (100.4%) sort took 12.107587s wall, 12.070000s user + 0.010000s system = 12.080000s CPU (99.8%) rotate took 0.011146s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 1.054511s wall, 0.680000s user + 0.190000s system = 0.870000s CPU (82.5%) done Total time = 51.383338s wall, 45.290000s user + 6.030000s system = 51.320000s CPU (99.9%) boost::container::vector benchmark with stack_allocator construction took 37.421275s wall, 31.590000s user + 5.840000s system = 37.430000s CPU (100.0%) sort took 12.009866s wall, 11.930000s user + 0.020000s system = 11.950000s CPU (99.5%) rotate took 0.013459s wall, 0.010000s user + 0.000000s system = 0.010000s CPU (74.3%) destruction took 1.053435s wall, 0.770000s user + 0.300000s system = 1.070000s CPU (101.6%) done Total time = 50.511806s wall, 44.300000s user + 6.170000s system = 50.470000s CPU (99.9%) std::vector benchmark construction took 31.849724s wall, 25.920000s user + 5.760000s system = 31.680000s CPU (99.5%) sort took 3.244223s wall, 3.240000s user + 0.110000s system = 3.350000s CPU (103.3%) rotate took 0.010454s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) destruction took 1.054326s wall, 0.780000s user + 0.280000s system = 1.060000s CPU (100.5%) done Total time = 36.173664s wall, 29.950000s user + 6.150000s system = 36.100000s CPU (99.8%) varray/boost::container::vector total time comparison: wall = 0.687979 user = 0.778097 system = 0.00995025 (user+system) = 0.687841 varray/(boost::container::vector + stack_allocator) total time comparison: wall = 0.699849 user = 0.795485 system = 0.00972447 (user+system) = 0.699425 varray/std::vector total time comparison: wall = 0.977248 user = 1.17663 system = 0.0097561 (user+system) = 0.977839 --------------------------------------------------------------------------------------------------------------------------- Linux clang_3_0_6 -O2 Core2Duo P7350 2GHz, 3GB Ram Additional benchmark of Adam Wulkiewicz’s rtree --------------------------------------------------------------------------------------------------------------------------- ins[s] qB[s] qiwo[s] qn[s] qnP5[s] qnB[s] rem[s] mem[MB] - container::vector 10.8 14.4 6.9 3.1 4.8 6.0 5.7 184.1 - std::vector 10.9 13.5 6.9 3.4 5.2 5.9 5.4 274,8 - varray 10.2 12.6 6.3 2.8 4.5 5.8 4.9 207,1 So in this case varray is slightly faster from both and consumes less memory than std::vector.

El 11/02/2013 18:12, Andrew Hundt escribió:
The short results summary: varray was generally at or near the top for performance due to its simplicity, and interestingly on almost all platforms std::vector significantly outperformed boost::container::vector, which edged out boost::container::vector<T, stack_allocator>.
Thanks for the report. Which boost version have you used? I remember adding some optimizations in Boost 1.53's container::vector. Anyway I'm interested in improving boost::container performance, since I haven't done any serious effort on that direction. Until know portability between C++03 and C++0x was the main goal. Best, Ion

The OS X + clang version was on 1.52, Adam will need to chime in for the others. Cheers! Andrew Hundt On Mon, Feb 11, 2013 at 4:41 PM, Ion Gaztañaga <igaztanaga@gmail.com> wrote:
El 11/02/2013 18:12, Andrew Hundt escribió:
The short results summary:
varray was generally at or near the top for performance due to its simplicity, and interestingly on almost all platforms std::vector significantly outperformed boost::container::vector, which edged out boost::container::vector<T, stack_allocator>.
Thanks for the report. Which boost version have you used? I remember adding some optimizations in Boost 1.53's container::vector. Anyway I'm interested in improving boost::container performance, since I haven't done any serious effort on that direction. Until know portability between C++03 and C++0x was the main goal.
Best,
Ion
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

El 12/02/2013 16:15, Andrew Hundt escribió:
The OS X + clang version was on 1.52, Adam will need to chime in for the others.
Cheers! Andrew Hundt
Thanks. Thanks to your benchmark I'll try to do some tweaks on trunk to see if boost::container::vector performance can be slightly improved. Best, Ion

El 12/02/2013 17:15, igaztanaga@gmail.com escribió:
El 12/02/2013 16:15, Andrew Hundt escribió:
The OS X + clang version was on 1.52, Adam will need to chime in for the others.
Cheers! Andrew Hundt
Adam, Andrew, I've committed changeset #82846, adding to trunk some performance improvements in vector's constructors. It should improve a bit boost::container::vector results. I saw that varray has some optimizations for trivial types (dispatching to memcpy) that boost::container misses. Maybe a fairer comparison would be to use a simple class instead of std::size_t as the basic type. Something like "copyable_int" in boost/libs/container/movable_int.hpp: class copyable_int { public: copyable_int() : m_int(0) {} explicit copyable_int(int a) : m_int(a) {} copyable_int(const copyable_int& mmi) : m_int(mmi.m_int) {} copyable_int & operator= (int i) { this->m_int = i; return *this; } //... private: int m_int; }; Best, Ion

Ion Gaztañaga wrote:
El 12/02/2013 17:15, igaztanaga@gmail.com escribió:
El 12/02/2013 16:15, Andrew Hundt escribió:
The OS X + clang version was on 1.52, Adam will need to chime in for the others.
Hi, I was testing using Boost 1.52.
Adam, Andrew, I've committed changeset #82846, adding to trunk some performance improvements in vector's constructors. It should improve a bit boost::container::vector results.
Ok, I'll try to find some time to test it.
I saw that varray has some optimizations for trivial types (dispatching to memcpy) that boost::container misses. Maybe a fairer comparison would be to use a simple class instead of std::size_t as the basic type. Something like "copyable_int" in boost/libs/container/movable_int.hpp:
In my test non-POD types were stored in containers, i.e. Box and std::pair<Box, ptr> where Box is box<point<double, 2, cartesian> >. Regards, Adam

On Wed, Feb 13, 2013 at 10:14 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com
wrote:
I saw that varray has some optimizations for trivial types (dispatching
to memcpy) that boost::container misses. Maybe a fairer comparison would be to use a simple class instead of std::size_t as the basic type. Something like "copyable_int" in boost/libs/container/movable_**int.hpp:
In my test non-POD types were stored in containers, i.e. Box and std::pair<Box, ptr> where Box is box<point<double, 2, cartesian> >.
For non-POD types locality of reference has quite significant effect on results of performance tests. Access to values through pointers is more costly against POD values. Non-POD types hardly can be useful for reliable performance measurements even for containers with small number of elements. The size of a single element std::pair<Box,ptr> is another factor that should have an effect on performance. Regards, Vadim Stadnik

Vadim Stadnik wrote:
On Wed, Feb 13, 2013 at 10:14 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com
wrote:
I saw that varray has some optimizations for trivial types (dispatching
to memcpy) that boost::container misses. Maybe a fairer comparison would be to use a simple class instead of std::size_t as the basic type. Something like "copyable_int" in boost/libs/container/movable_**int.hpp:
In my test non-POD types were stored in containers, i.e. Box and std::pair<Box, ptr> where Box is box<point<double, 2, cartesian> >.
For non-POD types locality of reference has quite significant effect on results of performance tests. Access to values through pointers is more costly against POD values. Non-POD types hardly can be useful for reliable performance measurements even for containers with small number of elements. The size of a single element std::pair<Box,ptr> is another factor that should have an effect on performance.
You're right. And this is what makes containers fast - locality and size for better caching, smaller methods for inlining. Because of that varray will probably always be faster than any vector. We were more concerned about the container::vector performance in relation to std::vector. Regards, Adam

On Wed, Feb 13, 2013 at 9:36 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com>wrote:
You're right. And this is what makes containers fast - locality and size for better caching, smaller methods for inlining. Because of that varray will probably always be faster than any vector. We were more concerned about the container::vector performance in relation to std::vector.
Yes, one of the main points was the performance gap between std::vector and container::vector. The varray benchmark just happened to be the way I noticed the gap. The other point was regarding the hypothetical hybrid_vector that was discussed in previous varray/static_vector threads, asking if a stack allocator with a normal allocation fallback is perhaps a reasonable substitute that would have comparable performance. If so it seems like hybrid_vector wouldn't perform that well possibly due to branching for the stack vs allocator cases and thus wouldn't make as much sense to implement. However, I wasn't certain if there may be some critical difference I am missing that someone could shed light on. Cheers! Andrew Hundt

On Wed, Feb 13, 2013 at 1:38 AM, Ion Gaztañaga <igaztanaga@gmail.com> wrote:
I saw that varray has some optimizations for trivial types (dispatching to memcpy) that boost::container misses.
Shouldn't a decent STL implementation do that automatically? AFAIR, STLPort's std::copy would resort to memcpy when possible. Don't other implementations do the same thing?

El 13/02/2013 9:25, Andrey Semashev escribió:
On Wed, Feb 13, 2013 at 1:38 AM, Ion Gaztañaga <igaztanaga@gmail.com> wrote:
I saw that varray has some optimizations for trivial types (dispatching to memcpy) that boost::container misses.
Shouldn't a decent STL implementation do that automatically? AFAIR, STLPort's std::copy would resort to memcpy when possible. Don't other implementations do the same thing?
The problem is that Boost.Container constructs objects using allocator_traits, and the STL has no algorithms for that. Boost.Container should improve these internal algorithms with some type_traits-based dispatching (trivial types, non-throwing operations, etc.). Best, Ion
participants (6)
-
Adam Wulkiewicz
-
Andrew Hundt
-
Andrey Semashev
-
igaztanaga@gmail.com
-
Ion Gaztañaga
-
Vadim Stadnik