
11 Oct
2008
11 Oct
'08
4:12 p.m.
Patrick Mihelich wrote:
Let's also add in a partially unrolled runtime access version that operates 4 elements at a time (ideally the compiler would take care of this).
Isn't GCC supposed to implement loop vectorization? Did you add in there SIMD instructions or not?