
21 Jan
2025
21 Jan
'25
6:37 p.m.
Matt Borland wrote:
Here's a data point on macOS ARM64: ... So we'd need to investigate all the different ABIs and switch the type on platform.
ARM64 also passes up to 128 bits in registers (x0-x7, up to four 128 bit types): https://godbolt.org/z/f6nGvx3zM vs https://godbolt.org/z/ds74Excq5 This may make the synthetic benchmarks slightly slower, but I don't think it's going to be faster overall in real code, because +50% size overhead will inevitably translate into more cache pressure.