On 16/11/2017 05:41, Peter Dimov wrote:
It's all slightly misleading anyway, because for the small matrix case the copy/move constructors don't actually have side effects and therefore get optimized out; they are only relevant in the case of something like std::string where copy/move aren't defaulted.
That was my point; your test code doesn't test copy elision because your constructors have side effects, so can't be elided. Of course, if you remove the counting side effects in that code then the compiler just inlines everything to a single mov constant 10 :) FWIW, in VC14.0 if you force it to use external parameters it can't inline away then this code: __declspec(noinline) T calc(int a, int b, int c, int d) { return T(a) + T(b) + T(c) + T(d); } Turns into this with the rvalue-ref-return operators: x86: ; _a$ = edx 00003 03 55 08 add edx, DWORD PTR _b$[ebp] 00006 03 55 0c add edx, DWORD PTR _c$[ebp] 00009 8b 45 10 mov eax, DWORD PTR _d$[ebp] 0000c 03 c2 add eax, edx 0000e 89 01 mov DWORD PTR [ecx], eax 00010 8b c1 mov eax, ecx x64: ; _a$ = edx ; _b$ = r8d ; _c$ = r9d 00000 41 03 d0 add edx, r8d 00003 48 8b c1 mov rax, rcx 00006 41 03 d1 add edx, r9d 00009 03 54 24 28 add edx, DWORD PTR d$[rsp] 0000d 89 11 mov DWORD PTR [rcx], edx Whereas with the value-return operators: x86: ; _a$ = edx 00003 8b 45 08 mov eax, DWORD PTR _b$[ebp] 00006 03 c2 add eax, edx 00008 03 45 0c add eax, DWORD PTR _c$[ebp] 0000b 03 45 10 add eax, DWORD PTR _d$[ebp] 0000e 89 01 mov DWORD PTR [ecx], eax 00010 8b c1 mov eax, ecx x64: ; _a$ = edx ; _b$ = r8d ; _c$ = r9d 00000 42 8d 04 02 lea eax, DWORD PTR [rdx+r8] 00004 41 03 c1 add eax, r9d 00007 03 44 24 28 add eax, DWORD PTR d$[rsp] 0000b 89 01 mov DWORD PTR [rcx], eax 0000d 48 8b c1 mov rax, rcx The rvalue versions are very slightly more efficient, it looks like, although they're pretty similar (and it's even sneaky enough to turn one of the adds into an lea in the last one). Though, of course, counting assembly ops means little with modern CPUs, so take that with a grain of salt. And again granted something bigger than an int or with non-trivial copy constructors will get different results, but overall it looks like I was wrong with my initial supposition.