
Joel Falcou wrote:
Le 20/12/2012 18:34, Peter Dimov a écrit :
What is the recommended Boost.SIMD way to write a function like
void add_n( float const * s, float const * s2, float * d, size_t n ); // d[i] = s[i] + s2[i]
where none of s, s2, d are guaranteed to be aligned?
You should align them ;)
More seriously, you can run a for using with pack and unaligned_load/store:
void add_n( float const * s, float const * s2, float * d, size_t n ) { size_t c = pack<float>::static_size; size_t vn = v / c * c; size_t sn = v % c;
for(std::size_t i=0, i<vn; i+= c, d+=c,s+=c,s2+=c) store(unaligned_load<pack<T>>(s) + unaligned_load<pack<T>>(s2), d );
for(std::size_t i=0, i<sn; i++,d++,s++,s2++) *d = *s + *s2; } ... Note that on any pre-Nehalem CPU, the unaligned load will be horrendsously slow.
Yes, and the right thing to do is to first check whether s and s2 are equally unaligned, and if so, have a prefix scalar loop that aligns them; if not, check whether s2 and d are equally unaligned, and align them; and finally, if neither of these are true, align s. (Although I'm not quite certain whether unaligned stores weren't costlier, in which case the order changes a bit.) Then proceed with the rest of your code above. This is tedious boilerplate so I wondered whether you had already provided a solution. simd::transform seems the logical place to put it.