On 10/28/2016 9:09 AM, Larry Evans wrote:
On 10/26/2016 10:00 AM, Michael Marcin wrote: [snip]
The sse emitter test used an aligned_allocator to guarantee 16 byte alignment for the std::vector data.
template< typename T > using sse_vector = vector
>; I assume that vector<T>'s data is allocated with new, and new, IIUC, guarantees maximum alignment; hence, boost::alignment::is_aligned(std::vector<T>::data(),16) should always be true. What am I missing?
Yes operator new isn't going to guarantee a 16 byte alignment on any platform I'm aware of. From n4606 § 3.7.4.1/2 The pointer returned shall be suitably aligned so that it can be converted to a pointer to any suitable complete object type (18.6.2.1) § 18.6.2.1/1 Effects: The allocation functions (3.7.4.1) called by a new-expression (5.3.4) to allocate size bytes of storage. The second form is called for a type with new-extended alignment, and allocates storage with the specified alignment. The first form is called otherwise, and allocates storage suitably aligned to represent any object of that size provided the object’s type does not have new-extended alignment. § 3.11/3 An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported (7.6.2). A type having an extended alignment requirement is an over-aligned type. [ Note: every over-aligned type is or contains a class type to which extended alignment applies (possibly through a non-static data member). — end note ] A new-extended alignment is represented by an alignment greater than __STDCPP_DEFAULT_NEW_ALIGNMENT__ (16.8). § 16.8/2 __STDCPP_DEFAULT_NEW_ALIGNMENT__ An integer literal of type std::size_t whose value is the alignment guaranteed by a call to operator new(std::size_t) or operator new[](std::size_t). [ Note: Larger alignments will be passed to operator new(std::size_t, std::align_val_t), etc. (5.3.4). — end note ] This has changed somewhat since C++11 ISO 14882 but the gist of it is the same in the older standard. § 3.7.4.1/2 The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) § 3.11/2 A fundamental alignment is represented by an alignment less than or equal to the greatest alignment sup ported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (18.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject. [ Example: struct B { long double d; }; struct D : virtual B { char c; } Basically new guarantees the returned pointer is aligned to alignof(max_align_t) On VS2015 max_align_t is double and alignof(max_align_t) == 8 which is not enough to guarantee 16 byte alignment. std::aligned_alloc was added for C++17 which solves this problem, boost has boost::alignment::aligned_alloc which also solves the problem pre-C++17.
BTW, see the new push:
https://github.com/cppljevans/soa/commit/ea28ff814c8c1ea879fa5ac2453c12fc382...
I've *not tested* it in:
https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp
but I think it should work. IOW, instead of:
sse_vector<float> position_x; sse_vector<float> position_y; sse_vector<float> position_z; sse_vector<float> velocity_x; sse_vector<float> velocity_y; sse_vector<float> velocity_z; sse_vector<float> acceleration_x; sse_vector<float> acceleration_y; sse_vector<float> acceleration_z; vector<float2> size; vector<float4> color; sse_vector<float> energy; vector<char> alive;
I think you could use:
soa_block < type_align
,// position_x; type_align ,// position_y; type_align ,// position_z; type_align ,// velocity_x; type_align ,// velocity_y; type_align ,// velocity_z; type_align ,// acceleration_x; type_align ,// acceleration_y; type_align ,// acceleration_z; float2_t,// size; float4_t,// color; type_align ,// energy; char// alive; > particles; where type_align is found here:
https://github.com/cppljevans/soa/blob/master/vec_offsets.hpp#L14
That's quite similar to what I have in one of my potential implementations.