
In an effort to eliminate heap allocations, I first tried replacing my using mpfr_float = boost::multiprecision::number<boost::multiprecision::mpfr_float_backend<0,boost::multiprecision::allocate_dynamic>, boost::multiprecision::et_off>; with allocate_stack, just for these named temporary variables.
Shouldn't you be using et_on based on your comments above? In any case you've found a bug, using mpfr_float_backend<0,boost::multiprecision::allocate_stack> creates a one-bit float, which I suspect is not what you wanted ;) I'll fix that so that it causes a compiler error.
However, the documentation <http://www.boost.org/doc/libs/1_60_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/mpfr_float.html> states that allocate_stack only works in fixed precision, so the 0 digits indicating variable precision should cause problems? I expected 0 digits to fail to compile, but compile it does. The conversion from the type with allocate_stack to allocate_dynamic is not provided by the = operator, so I can write re_cache.convert_to<mpfr_float_dynamic>(). But using variable precision with stack allocation appears to cause all sorts of problems. Everything depending on complex multiplication in my program breaks, indicated by massive failures in my test suite. Thus, I currently regard allocate_stack as a non-solution.
My question fundamentally regards how to deal with re_cache and the*= operator for complex multiplication. Anyone have ideas on how to get rid of constant heap de/allocation of re_cache without inducing a ton more arithmetic? Using a static for it is not a solution, both because I anticipate multithreading in this application in the future, and the fact that precision will vary though the run, so I'd have to check the precision of re_cache on every evaluation. The C version of the program I am re-implementing used OpenMP for threads, and used a thread-id-indexed global for re_cache. That solution then forces OpenMP onto anyone wanting to use the complex class in multiple threads. Hence, I view this as a non-solution, too.
Again, my initial thought was 'heap allocation is the problem here, so I'll use the stack allocated backend'. But variable precision doesn't appear to work with allocate_stack. And statics are no good.
Any thoughts?
Yes, but none you may like. * You could use Boost.Thread for thread-local statics. * In C++11 you can use thread_local storage, see http://en.cppreference.com/w/cpp/language/storage_duration * For pre-C++11 you could use __thread or __declspec(thread) in a non-portable compiler-specific way. * If this was C99 you could use a variable-length-array and initialize a temporary mpfr_t yourself. * Ditto, but using alloca (this does work in some C++ implementations but not all). * You could use a temporary buffer big enough for the largest precision you ever use, and use that to initialize an mpfr_t yourself. Of course this may run you out of stack space ;) And finally... you may not see as much speedup as you expect unless the precision is low - for any significant number of digits the cost of the arithmetic typically far outweighs everything else. HTH, John.