
General Comments
Counting Digits - It's easy to say a binary search tree is a naive implementation. A CLZ based implementation from blog posts by Daniel Lemire and Junekey Jeon can be found on branch `better_count_dig`. I found it actually benchmarked worse on a number of platforms. It's worth emphasizing that simple code will always run faster for simple problems than a theoretically better but more complex algorithm:
the classic example is that a linear search will generally beat a binary search when the number of items are low. There should probably be a comment in the code indicating that the alternative has been tried and was found to be slower though.
"Only interesting design question is if functions producing new value should be void returning and modify inplace argument passed by reference(as for example std::ranges::sort does) or they should be returning a value." - I don't see any advantage to this as it would be a serious departure from expectations and norms.
Taking decimal types by reference instead of by value - fundamentally the decimalXX types are std::uint32_t, std::uint64_t, and struct { uint64_t hi, uint64_t lo }. I don't think you'll see any performance improvements with those. Maybe for the fast types? Those are still reasonably small structs. We can try a few.
It may be worth the experiment, but I'd be surprised if adding a level of indirection is of benefit over passing arguments by value in a couple of registers: possibly there may be benefit for functions accepting many arguments, but I'm assuming there aren't too many of those? John.