On 10/31/2016 12:00 PM, Michael Marcin wrote:
On 10/31/2016 9:14 AM, Larry Evans wrote:
However, I was still getting the 'double free' error message; hence, I tried val_grind. It showed a problem in the alive update loop. When the code was changed to:
uint64_t *block_ptr = alive.data(); auto e_ptr = energy.data(); for ( size_t i = 0; i < n; ) { #define REVISED_CODE #ifdef REVISED_CODE auto e_i = e_ptr + i; #endif uint64_t block = 0; do { #ifndef REVISED_CODE //this code causes valgrind to show errors. auto e_i = e_ptr + i; #endif _mm_store_ps( e_i, _mm_sub_ps( _mm_load_ps( e_i ), t )); block |= uint64_t ( _mm_movemask_ps( _mm_cmple_ps( _mm_load_ps( e_i ), zero ))) << (i % bits_per_uint64_t) ; i += 4; } while ( i % bits_per_uint64_t != 0 ); *block_ptr++ = block; }
valgrind reported no errors; however, when !defined(REVISED_CODE), valgrind reported:
valgrind --tool=memcheck /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
==7937== Memcheck, a memory error detector ==7937== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==7937== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==7937== Command: /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
==7937== COMPILE_OPTIM=0 particle_count=1,000
particle_count=1,000 is not a multiple of 64, the optimized energe/alive loop processes 64 particles at a time. I haven't bothered to analyze what the code will do in this case but memory corruption is likely
I see. However, still, since the calls to the _mm_* functions in the previous loop all are called with i%4==0 (because the i increment is i+=4) here: https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L934 shouldn't the same apply to the alive loop call here: https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L956 and putting the e_i assignment outside the alive loop here: https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L953 assures that.
The code to handle a tail (if particle_count % 64 != 0) isn't difficult to add but it is explicitly left out. One of the things you'll often do in a system such as this is fit the data to optimize the algorithm. In the case of a particle system plus or minus 0 to 63 particles is generally unnoticeable.
You can address the problem however you like but the simplest solution would be to change your small particle count to 16 * 64 = 1024.
OK. I've done that.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost