Re: [boost] [hash2][review] Early review (due to holidays)

6 Dec 2024

      Hi Vinnie,

did not know pasting image will get me sent straight to moderator approval
due to message size, apologies, here is my message with image removed.

On Fri, Dec 6, 2024 at 7:48 PM Ivan Matek <libbooze@gmail.com> wrote:
...
On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco <vinnie.falco@gmail.com>
wrote:
...
...
How?
maybe we are not talking about same situation, but this is what I meant,
godbolt <https://godbolt.org/z/K8xEjEoMK> link
If you look at

nt f_span_static<3ul>(std::__1::span<int, 3ul>):
mov eax,DWORD PTR [rdi+0x4]
add eax,DWORD PTR [rdi]
add eax,DWORD PTR [rdi+0x8]
ret
nop DWORD PTR [rax+0x0]
int f_span_static<4ul>(std::__1::span<int, 4ul>):
movdqu xmm0,XMMWORD PTR [rdi]
pshufd xmm1,xmm0,0xee
paddd xmm1,xmm0
pshufd xmm0,xmm1,0x55
paddd xmm0,xmm1
movd eax,xmm0
ret
...
you will see that compiler will generate "specialized" functions for each
different size of span with non dynamic extent. Here you can see how he
implemented summation for 3 and 4 integers in different ways.
This is great for performance and makes checking easier since as Peter
explained compiler knows more, but it creates larger binaries(generally
speaking, I know compilers are smart, can inline, for 2 instantiations does
not really matter, etc).
Function taking dynamic span or runtime specified n will probably be
slower because it does normal loop, but there is only one copy of it in
resulting assembly.
There is a real life example of this here in fmt. note link gives
certificate error and I did not manage to find another link
https://vitaut.net/posts/2020/reducing-library-size/