
On 12/10/24 21:15, Peter Dimov via Boost wrote:
Ruben Perez wrote:
Is it expected to reach similar levels of performance in the future using Hash2? Or is this not in scope for the library?
I don't see why it wouldn't be in scope; performance optimizations are never ruled out.
Taking advantage of SHA-NI when it's statically known to be available (under GCC/Clang -march=native, for instance) should be relatively easy to add.
Requiring SHA instructions unconditionally sets a rather high bar on the CPU requirements.
Dynamic dispatch is a bit more convoluted because of constexpr but should be possible.
SIMD intrinsics (at least x86 ones) are not constexpr-friendly as they require reinterpret_cast to access memory. Only AVX-512 added load/store intrinsics taking void pointers. Maybe gcc's vector extensions are better, but I'm not sure if SHA intrinsics will work with them. I suspect the constexpr version will have to always use the generic code, and the runtime version may use CPU detection and vectorized code.