
Hi Vinnie, did not know pasting image will get me sent straight to moderator approval due to message size, apologies, here is my message with image removed. On Fri, Dec 6, 2024 at 7:48 PM Ivan Matek <libbooze@gmail.com> wrote:
On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:
How?
maybe we are not talking about same situation, but this is what I meant,
godbolt <https://godbolt.org/z/K8xEjEoMK> link
If you look at nt f_span_static<3ul>(std::__1::span<int, 3ul>): mov eax,DWORD PTR [rdi+0x4] add eax,DWORD PTR [rdi] add eax,DWORD PTR [rdi+0x8] ret nop DWORD PTR [rax+0x0] int f_span_static<4ul>(std::__1::span<int, 4ul>): movdqu xmm0,XMMWORD PTR [rdi] pshufd xmm1,xmm0,0xee paddd xmm1,xmm0 pshufd xmm0,xmm1,0x55 paddd xmm0,xmm1 movd eax,xmm0 ret
you will see that compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways.
This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly.
There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/