
On Tue, Jan 21, 2025 at 5:41 PM Peter Dimov
Ivan Matek wrote:
On Tue, Jan 21, 2025 at 4:20 PM Peter Dimov
mailto:pdimov@gmail.com > wrote: Only decimal128_fast doesn't fit in two registers.
> Also not all functions take 1 argument, in PDF I > explicitly used copysign as example.
x86-64 uses up to 6 registers for parameter passing (RDI, RSI, RDX, RCX, R8, R9), which means that up to three 128 bit trivially copyable types can be passed in registers when pass by value is used.
I got a bit confused to be honest with all this I should have not gone and wrote reply before trying this out on godbolt, my apologies for noise.
But here are 2 things I believe are not correct in your response. Passing up to three 128 bit trivial types in registers is not property of X86-64, but of Linux ABI, Windows ABI is different.
Yes, that's why I said "non-Windows x86-64". :-)
Sorry, will try to have larger context *window* when responding in the future. :)
On my machine decimal64_fast is not 128 bit because typedef unsigned long int uint_fast16_t; static_assert(sizeof(decimal64_fast) == 24); if you do not believe me here is godbolt: https://godbolt.org/z/xPzYTP6cM
That's true, and it's actually not passed in registers for this reason.
https://godbolt.org/z/qs5z6Th9e
Interesting. Looks like this is caused by the use of uint_fast16_t, which is actually uint64_t. Maybe not the best choice.
I actually thought about writing in review about if using std:: fast types makes sense or not, but I was already tired :).
From what I know they offer no benefit for most modern architectures, and more importantly even if some type is actually "fastest at least 16 bit integer" there is no guarantee that what std:: picked is actually that type. E.g. I can imagine that for case we are discussing here fastest type is uint16_t or uint32_t but library implementers picked uint64_t 20 years ago and now it is baked in forever(hello ABI my old friend). There is also the fact that fastest type may not be fastest for all compute benchmarks you can imagine, I presume on modern CPUs this does not matter, but maybe for some old CPU for add one type is fastest, for mul other is...
https://godbolt.org/z/8fczrzbEY is better.
As I wrote above I have no faith in fastness of _fast_t types in std:: so I agree.