
On 10/26/2016 12:58 AM, Larry Evans wrote:
On 10/25/2016 11:07 PM, Michael Marcin wrote:
On 10/25/2016 8:23 PM, Larry Evans wrote:
At the very least support for the basic SSE 16 byte alignment of subarrays is crucial.
My best idea so far is some magic wrapper type that gets special treatment. Like: using data_t = soa_block< float3, soa_align<float,16>, bool >;
Something like:
template<typename T, std::size_t Alignment> struct alignas(Alignment) soa_align { T data; };
Have you tried that yet. If not, I might try.
The issue is you don't want to overalign all elements of the array, just the first element.
But aligning the first soa_align<T,A> is all that's needed because sizeof(soa_align<T,A>)%A == 0, hence, all subsequent elements would be aligned. At least that's my understanding. Am I missing something?
Perhaps I'm misunderstanding. Using your struct above: std::array< soa_align<float, 16>, 4 > data; std::cout << "align array: " << alignof(decltype(data)) << '\n' << "size element: " << sizeof( data[0] ) << '\n' << "size array: " << sizeof( data ) << '\n' << "offset[1]: " << (char*)&(data[1]) - (char*)data.data() << '\n'; align array: 16 size element: 16 size array: 64 offset[1]: 16 For data to work with SSE instructions this needs to report: align array: 16 size element: 4 size array: 16 offset[1]: 4 i.e. 4 floats have to be contiguous in memory, and the *first* float has to be aligned to 16 bytes.
I have a working solution (using Peter Dimov's mp11 library as I'm not well-versed in post cpp03 metaprogramming).
I'm just trying to play around with implementation ideas at the moment.
Basically it'd be a nice to store only a single pointer and cheap constant time member sub-array access.
But with alignment concerns all I've managed so far are two implementations.
1. 1 pointer with linear time member array access 2. n-pointers with constant time member array access
I feel like there should exist implementation that trades a bit of dynamic allocation size for a single pointer and constant time member array access.
I intended soa_block to fill that need (after all the tasks shown in the **TODO** comments were done). If you see some flaw in the code, of course, I love to hear about it. **TODO**
IIRC it had implemented roughly the #2 strategy, storing a pointer + an array of n+1 offsets to access n members in constant time.