
On Tuesday, January 21st, 2025 at 12:56 PM, Ivan Matek via Boost <boost@lists.boost.org> wrote:
On Tue, Jan 21, 2025 at 6:06 PM Peter Dimov pdimov@gmail.com wrote:
Basically, _fast types are almost never fast. Let's hope this curse doesn't afflict Decimal _fast types as well. :-)
Thank you for confirming this, now I regret not writing that also in
review. :)
To recap the discussion wrt pass by reference: points we agree on(wrt 64 bit x86):
- if we pass by value on Windows we are out of luck for most decimal types since there is ABI limitation for types greater than 8 bytes - use of uint16_fast_t in implementation pushed decimal64_fast over the limit of Linux ABI(16 bytes), _fast std:: types are not fast, would be nice to change this even if will not help on Windows
points we disagree on:
- pass by reference for large types (and if necessary mutate inplace) is still my prefered API. If we do want value returning functions then for large types I would prefer to pass args by const reference.
Not trying to change your mind, just recapping above discussion, doing all sizeof and ABI math is tricky.
Here's a data point on macOS ARM64: Benchmarks on Current Develop: ===== Comparisons ===== comparisons<dec32_fast >: 555534 us (s=29999985) comparisons<dec64_fast >: 680204 us (s=29999985) comparisons<dec128_fast>: 598125 us (s=29999985) ===== Addition ===== Addition<dec32_fast >: 1112121 us Addition<dec64_fast >: 1282197 us Addition<dec128_fast>: 6967490 us ===== Subtraction ===== Subtraction<dec32_fast >: 930937 us Subtraction<dec64_fast >: 1127780 us Subtraction<dec128_fast>: 3645142 us ===== Multiplication ===== Multiplication<dec32_fast >: 776308 us Multiplication<dec64_fast >: 1145653 us Multiplication<dec128_fast>: 17949983 us ===== Division ===== Division<dec32_fast >: 904825 us Division<dec64_fast >: 1659034 us Division<dec128_fast>: 1648759 us Every uint_fastXX_t or int_fastXX replaced by uintXX_t or intXX_t: ===== Comparisons ===== comparisons<dec32_fast >: 586511 us comparisons<dec64_fast >: 709211 us comparisons<dec128_fast>: 657634 us ===== Addition ===== Addition<dec32_fast >: 1121786 us Addition<dec64_fast >: 1289409 us Addition<dec128_fast>: 7020519 us ===== Subtraction ===== Subtraction<dec32_fast >: 963632 us Subtraction<dec64_fast >: 1160965 us Subtraction<dec128_fast>: 3695792 us ===== Multiplication ===== Multiplication<dec32_fast >: 800159 us Multiplication<dec64_fast >: 1179959 us Multiplication<dec128_fast>: 18459046 us ===== Division ===== Division<dec32_fast >: 928643 us Division<dec64_fast >: 1683552 us Division<dec128_fast>: 1684860 us So we'd need to investigate all the different ABIs and switch the type on platform. I'll note that my primary development is on ARM Mac. Another win for Intel's complete vertical integration. Matt