
Le 25/12/2012 15:43, Peter Dimov a écrit :
Mathias Gaunard wrote:
The shifted iterator and the shifted load allow to do aligned loads if you statically know the misalignment of the memory.
Does this have any performance advantage over just using an unaligned load? I'd expect the microcode to do whatever the shifted load does, but I haven't measured it.
Shifted load is a couple of aligned load + bit shuffling. This is a technique steming from way back on Altivec. Experiments done on 1D filtering using both show some benefits over unaligned load on pre-Nehalem CPUs. It's usually better in this kind of kernel as we can reuse register to save load in the inenr loops, thus in fact recuding the global number of loads for a given filter run. See Lacassagne[1] et al for actual description of the algorithm. Incidentally, such register saving techniques are already buitl in the shifted_iterator implementation.