I guess some optimisation from way yonder (something modern compilers do routinely, even on a Monday morning!)... but more than probable irrelevant nowadays...
degski
I might be pessimistic, but I never trust the compiler and generally check what's being output. In this case, FWIW, on MSVC2015, the bit-twiddling version generates faster code than the mod version -- about 25% faster. I didn't test gcc or clang. Using google benchmark: Code: static void AlignedMod(benchmark::State& state) { while (state.KeepRunning()) { for(int i = state.range_x(); i < 128; i += state.range_y()) { bool aligned = (i % 16) == 0; benchmark::DoNotOptimize(aligned); } } } BENCHMARK(AlignedMod)->ArgPair(1, 1); static void AlignedAnd(benchmark::State& state) { while (state.KeepRunning()) { for(int i = state.range_x(); i < 128; i += state.range_y()) { bool aligned = ((i - 1) & 15) == 0; benchmark::DoNotOptimize(aligned); } } } BENCHMARK(AlignedAnd)->ArgPair(1, 1); Generated code of the inner loop: Mod version: mov eax,ebx and eax,8000000Fh jge AlignedMod+50h dec eax or eax,0FFFFFFF0h inc eax test eax,eax lea rcx,[aligned] sete byte ptr [aligned] call 07FF73B84A180h add ebx,dword ptr [rdi+1Ch] cmp ebx,80h jl AlignedMod+40h And version: lea eax,[rbx-1] test al,0Fh lea rcx,[aligned] sete byte ptr [aligned] call 07FF73B84A180h add ebx,dword ptr [rdi+1Ch] cmp ebx,80h jl AlignedAnd+40h Result: Benchmark Time CPU Iterations ------------------------------------------------------------------------- AlignedMod/1/1 204 ns 203 ns 4072727 AlignedAnd/1/1 153 ns 154 ns 4977778 -- chris