Re: [boost] interest in structure of arrays container?

21 Oct 2016


      On 10/21/2016 01:07 AM, Michael Marcin wrote:
...
On 10/21/2016 12:48 AM, Michael Marcin wrote:
...
On 10/20/2016 10:02 PM, Larry Evans wrote:
...
The modification added soa_emitter_block_t which uses soa_block.
Unfortunately, this soa_emitter_block_t takes about twice as long as
your soa_emitter_static_t.
I've no idea why.  Any guesses?
2x is quite an abstraction penalty.
I can only assume your compiler is failing to optimize away some part of
the abstraction.
OOPS.  Yeah, I forgot about run-time optimization compiler flags :(
...
...
FWIW on vs2015 I'm not seeing nearly as much of a difference.
particle_count=1,000,000
AoS in 6.34667 seconds
SoA in 4.26384 seconds
SoA flat in 4.16572 seconds
SoA Static in 5.4037 seconds
SoA block in 5.5588 seconds
I'm still trying to work out how to fit overaligned subarrays into your
framework.
The issue is that many simd instructions require more than just
alignof(T) alignment.
subarrays of float/double/int/short/char or carefully crafted udts might
need to be aligned to as much as 64bytes in the worst case.
On the MIC architecture, vector load/store operations
    must be called on 64-byte aligned memory addresses.
    On the Xeon architecture with AVX/AVX2 instruction sets
    (Sandy Bridge, Ivy Bridge or Haswell), alignment does not matter.
    In earlier architectures (Nehalem, Westmere) alignment did matter,
    but a 32-byte alignment was necessary.
https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/507...
At the very least support for the basic SSE 16 byte alignment of
subarrays is crucial.
My best idea so far is some magic wrapper type that gets special
treatment. Like:
using data_t = soa_block< float3, soa_align<float,16>, bool >;
This maybe opens the door for other magic types like:
using data_t = soa_block< float3, soa_align<float,16>, soa_bit >;
That seems reasonable to me.