
25 Dec
2012
25 Dec
'12
3:28 p.m.
Mathias Gaunard wrote:
On 25/12/12 15:43, Peter Dimov wrote:
Mathias Gaunard wrote:
The shifted iterator and the shifted load allow to do aligned loads if you statically know the misalignment of the memory.
Does this have any performance advantage over just using an unaligned load? I'd expect the microcode to do whatever the shifted load does, but I haven't measured it.
The shifted load statically knows the alignment, unaligned loads do not. Note that the switch is outside of the loop, not for each load.
Ah, you're probably talking AltiVec, which probably doesn't have an unaligned load instruction. I was thinking SSE, which does.