
I briefly glanced at your pow-N routines. Perhaps I'm wrong, but I believe you are not doing binary splitting of the exponent here.
My mistake. I apologize. You are doing binary splitting of the exponent. But it looks like you still have a redundant multiplication step that could be removed with a recursive call. In particular, I believe your routine would need 14 multiplications to compute x^255, whereas only 7 would be needed with a recursive call. Basically, one needs to ask if the overhead of recursive call exceeds that of a multiply. I'll leave this one for the potential reviewers. If you look into our preliminary work on Boost.Multiprecision, you will find the recursive pow-n algorithm in the implementation of pow_imp() in pow.hpp. It's a bit tricky with the eval_... functions, but the recursive binary splitting is clearly visible in the code. It can be viewed here. https://svn.boost.org/svn/boost/sandbox/big_number/boost/multiprecision/deta... Best regards, Chris.