On Sun, Apr 20, 2014 at 7:17 AM, Thijs van den Berg
... // loop version threefry4x64_08_64: 10.2318 nsec/loop = 19.13350 CPU cycles threefry4x64_13_64: 14.3048 nsec/loop = 26.75000 CPU cycles threefry4x64_20_64: 22.6186 nsec/loop = 42.29680 CPU cycles threefry4x64_99_64: 100.0110 nsec/loop = 187.02200 CPU cycles
// 40x manual unrolled version threefry4x64_08_64: 3.7386 nsec/loop = 6.99118 CPU cycles threefry4x64_13_64: 5.1223 nsec/loop = 9.57870 CPU cycles threefry4x64_20_64: 7.3078 nsec/loop = 13.66560 CPU cycles threefry4x64_99_64: 29.3599 nsec/loop = 54.90300 CPU cycles
You may want to ask others to run the tests with various CPU's and compilers. Making optimization decisions based on a single platform can be quite misleading.
Remember Knuth's famous quote: "*premature optimization is the root of all evil*":-) --Beman