On 10/31/2016 03:09 AM, Michael Marcin wrote:
On 10/30/2016 7:45 AM, Larry Evans wrote:
Would you post that somewhere? I'd be curious about how it differs.
My code isn't very complete but since everyone else is sharing I'll post what I've got.
FWIW I had to go back pretty far and then still make changes to get a version of your soa_compare.benchmark.cpp that compiled on windows VS2015.
particle_count=1,000,000 minimum duration=2.85542
comparative performance table:
method rel_duration ________ ______________ Aos 3.08266 SoA 1.51109 Flat 1.50358 StdArray 1.91317 Block 1.60447 SSE 1.35994 SSE_opt 1.18056 SSE_goon 1 Press any key to continue . . .
Thanks Michael. I found it interesting.
However, I was still getting the 'double free' error message; hence,
I tried val_grind. It showed a problem in the alive update loop.
When the code was changed to:
uint64_t *block_ptr = alive.data();
auto e_ptr = energy.data();
for ( size_t i = 0; i < n; ) {
#define REVISED_CODE
#ifdef REVISED_CODE
auto e_i = e_ptr + i;
#endif
uint64_t block = 0;
do {
#ifndef REVISED_CODE
//this code causes valgrind to show errors.
auto e_i = e_ptr + i;
#endif
_mm_store_ps( e_i, _mm_sub_ps( _mm_load_ps( e_i ), t ));
block |=
uint64_t
( _mm_movemask_ps( _mm_cmple_ps( _mm_load_ps( e_i ),
zero )))
<< (i % bits_per_uint64_t)
;
i += 4;
} while ( i % bits_per_uint64_t != 0 );
*block_ptr++ = block;
}
valgrind reported no errors; however, when !defined(REVISED_CODE),
valgrind reported:
valgrind --tool=memcheck
/tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
==7937== Memcheck, a memory error detector
==7937== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7937== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==7937== Command:
/tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
==7937==
COMPILE_OPTIM=0
particle_count=1,000
frames=1,000
{run_test=SSEopt_vec
==7937== Invalid read of size 16
==7937== at 0x403D6B: emitter_t<(method_enum)6>::update()
(soa_compare.benchmark.cpp:962)
==7937== by 0x403094: run_result_t run_test
(unsigned long, unsigned long) (soa_compare.benchmark
My soa_compare.benchmark.cpp:962 line is: _mm_store_ps( e_i, _mm_sub_ps( _mm_load_ps( e_i ), t )); -regards, Larry