Re: [boost] Proposal: Monotonic Containers

The space has already been created before the allocator is requested to provide for new storage.
Otherwise, we could never use a vector<char> on a machine that requires 16-byte alignment.
This is not correct. Allocators allocate memory as single chunk. So, if you use vector<char> that has size of 10 characters, it would allocate a single chunk of at least 10 bytes, and it's beginning would be aligned to 8 or 16. So if you for example, allocate 3 charrecters that start at 0x100 to 0x102, and then you would try to allocate single integer you need to return pointer with value 0x104, 0x108 or 0x110-- aligned to 4, 8 or 16 according to architecture requirements, if you would return 0x103 it may cause illegal operation on some architectures like ARM, hundreds cycles cost on architectures like IA64 and fault in execution of atomic operations like in shared_ptr. You **must** return aligned pointers from allocators. Artyom

Hi Artyom, You **must** return aligned pointers from allocators. boost::monotonic does not allocate. Regards, Christian.

On Jun 10, 2009, at 4:54 AM, Christian Schladetsch wrote:
Hi Artyom,
You **must** return aligned pointers from allocators.
boost::monotonic does not allocate.
I wonder what that 'allocator' of yours is for then? ;-) Of course it allocates. And you fail to align the start address, which makes certain use cases crash on certain platforms. On Intel, it "just" gives the user a worse performance when misaligned. This is not aligned (pun intended...) with your overall goal to give the developer a high-performance tool. But do not take my word for it, I ran a test with your container, the attached sample program and got around 33.3 picoseconds on average to perform an increment on an aligned long compared to 35.6 picoseconds for a misaligned long, i.e., some 8% difference. Actually, I got a bit more with other operations. /David #include <assert.h> #include <stdlib.h> #include <string.h> #include <iostream> #include <boost/timer.hpp> #include <boost/monotonic/vector.h> #include <boost/monotonic/list.h> #include <boost/monotonic/map.h> #include <boost/monotonic/set.h> template<typename C> struct Foo { long ord; C c; }; template<typename C> void test_loop() { boost::monotonic::inline_storage<100000> storage; boost::monotonic::vector<Foo<C> > vec(storage); const int LOOP_COUNT = 100000000; const int ELEM_COUNT = 1000; Foo<C> orig = { 'A', 65 }; vec.assign(ELEM_COUNT, orig); boost::timer timer; for (int i = 0; i < LOOP_COUNT; ++i) ++vec[1 + i % (ELEM_COUNT - 2)].ord; double elapsed = timer.elapsed(); std::cout << "Incrementing ord = " << 1000000000*elapsed/LOOP_COUNT << " ps per iteration" << std::endl; } int main(int argc, const char* argv[]) { test_loop<char>(); test_loop<long>(); }

On Jun 10, 2009, at 1:49 PM, David Bergman wrote:
On Jun 10, 2009, at 4:54 AM, Christian Schladetsch wrote:
Hi Artyom,
You **must** return aligned pointers from allocators.
boost::monotonic does not allocate.
I wonder what that 'allocator' of yours is for then? ;-)
Of course it allocates. And you fail to align the start address, which makes certain use cases crash on certain platforms. On Intel, it "just" gives the user a worse performance when misaligned. This is not aligned (pun intended...) with your overall goal to give the developer a high-performance tool. But do not take my word for it, I ran a test with your container, the attached sample program and got around 33.3 picoseconds on average to perform an increment on an aligned long compared to 35.6 picoseconds for a misaligned long, i.e., some 8% difference. Actually, I got a bit more with other operations.
In fact, the attached version is a very lenient one, since it does give proper alignment for a lot of iterations... I used prime number distributions in other samples. Nevertheless, this sample produced between 8% and 10% difference between the two cases. I can provide you with more information about the performance hit of using misaligned stuff if you want. /David
#include <assert.h> #include <stdlib.h> #include <string.h> #include <iostream>
#include <boost/timer.hpp>
#include <boost/monotonic/vector.h> #include <boost/monotonic/list.h> #include <boost/monotonic/map.h> #include <boost/monotonic/set.h>
template<typename C> struct Foo { long ord; C c; };
template<typename C> void test_loop() { boost::monotonic::inline_storage<100000> storage; boost::monotonic::vector<Foo<C> > vec(storage); const int LOOP_COUNT = 100000000; const int ELEM_COUNT = 1000; Foo<C> orig = { 'A', 65 }; vec.assign(ELEM_COUNT, orig); boost::timer timer; for (int i = 0; i < LOOP_COUNT; ++i) ++vec[1 + i % (ELEM_COUNT - 2)].ord; double elapsed = timer.elapsed(); std::cout << "Incrementing ord = " << 1000000000*elapsed/LOOP_COUNT << " ps per iteration" << std::endl; }
int main(int argc, const char* argv[]) { test_loop<char>(); test_loop<long>(); }
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Jun 10, 2009, at 1:58 PM, David Bergman wrote:
On Jun 10, 2009, at 1:49 PM, David Bergman wrote:
On Jun 10, 2009, at 4:54 AM, Christian Schladetsch wrote:
Hi Artyom,
You **must** return aligned pointers from allocators.
boost::monotonic does not allocate.
I wonder what that 'allocator' of yours is for then? ;-)
Of course it allocates. And you fail to align the start address, which makes certain use cases crash on certain platforms. On Intel, it "just" gives the user a worse performance when misaligned. This is not aligned (pun intended...) with your overall goal to give the developer a high-performance tool. But do not take my word for it, I ran a test with your container, the attached sample program and got around 33.3 picoseconds on average to perform an increment on an aligned long compared to 35.6 picoseconds for a misaligned long, i.e., some 8% difference. Actually, I got a bit more with other operations.
In fact, the attached version is a very lenient one, since it does give proper alignment for a lot of iterations... I used prime number distributions in other samples. Nevertheless, this sample produced between 8% and 10% difference between the two cases. I can provide you with more information about the performance hit of using misaligned stuff if you want.
When packing the structures, such as with __attribute((__packed__)) for GCC, I got an almost 20% speed penalty on my Intel-based OS X laptop, when using your non-aligning allocator. /David
participants (3)
-
Artyom
-
Christian Schladetsch
-
David Bergman