
Try to do the pointer access with the simple "res += array[ti][tj] ;" instruction. The results from my earlier post was from this. The res = uni() is not the core of the computation, just the array initialisation :) The greater the computation,
In a
the smaller the overhead will be IMHO. For a computation like res + = array[i][j]*cos(array[i][j]) versus its while( begin++) equivalent, the overhead is never greater than 1% program where I use much images and arrays the real gain was
switching few functions to handwritten assembler code. 6 times faster than
an already good c++ algorithm. All the rest of the application is standard elegant and "slow" code. But it doesn't impact the overall performance.
I perfectly agree but I think that in most case, going down that low level is not needed, vene in image processing. i would rather take a few minutes to SIMDify a code if possible than rewriting it in inline assembly. Anyway, I'm onto writing this small multi_array with indexing to see how it fares.