
Sorry for the noise :: the loop nest generator is basically a recursive function that gets dimensions informations from a fusion sequence of dimension set. The storage order is applied as a permutation on this sequence and passed to the generator. the shuffled position is passed to the whole expression tree and when we're down in a terminal, it get applied again to grant access to the proper memory size. The shuffling of position is done at compile time usign MPL on fusion::at_c parameter and incurs no overhead. That's the basic theory. Now above 2D, most array representation and memory access incurs a high payload on registers and may limit performances. So we treat nD array as a flattened 2D one and aggregate outer dimensions after shuffling.