
Martin Schulz <Martin.Schulz <at> synopsys.com> writes:
That gives: f3(x,y,z) took 1.515 seconds to run 1e+009 iterations with double = 2.64026e+009 flops f3(x,y,z) took 4.656 seconds to run 1e+009 iterations with quantity<double> = 8.59107e+008 flops
2.6 GFlops? That is ok for a single thread. But the zero-overhead appears to be a factor of 3 now!
Really? I get f(x,y,z) took 1.953 seconds to run 1e+009 iterations with double = 2.04813e+009 flops f(x,y,z) took 1.906 seconds to run 1e+009 iterations with quantity<double> = 2.09864e+009 flops I am compiling with /Ox the innermost loop is identical for the double and quantity versions: sub eax, 1 fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) jne SHORT $LN17@f Obviously the compiler is cheating. In Christ, Steven Watanabe