Re: [Boost-users] mpfr_float stack allocation

10 May 2016

...
In an effort to eliminate heap allocations, I first tried replacing my 
using mpfr_float 
= boost::multiprecision::number<boost::multiprecision::mpfr_float_backend<0,boost::multiprecision::allocate_dynamic>, 
boost::multiprecision::et_off>;  with allocate_stack, just for these 
named temporary variables.
Shouldn't you be using et_on based on your comments above?

In any case you've found a bug, using 
mpfr_float_backend<0,boost::multiprecision::allocate_stack> creates a 
one-bit float, which I suspect is not what you wanted ;)  I'll fix that 
so that it causes a compiler error.
...
However, the documentation 
<http://www.boost.org/doc/libs/1_60_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/mpfr_float.html> 
states that allocate_stack only works in fixed precision, so the 0 
digits indicating variable precision should cause problems?  I 
expected 0 digits to fail to compile, but compile it does.  The 
conversion from the type with allocate_stack to allocate_dynamic is 
not provided by the = operator, so I can write 
re_cache.convert_to<mpfr_float_dynamic>(). But using variable 
precision with stack allocation appears to cause all sorts of 
problems.  Everything depending on complex multiplication in my 
program breaks, indicated by massive failures in my test suite.  Thus, 
I currently regard allocate_stack as a non-solution.
My question fundamentally regards how to deal with re_cache and the*= 
operator for complex multiplication.  Anyone have ideas on how to get 
rid of constant heap de/allocation of re_cache without inducing a ton 
more arithmetic?  Using a static for it is not a solution, both 
because I anticipate multithreading in this application in the future, 
and the fact that precision will vary though the run, so I'd have to 
check the precision of re_cache on every evaluation.  The C version of 
the program I am re-implementing used OpenMP for threads, and used a 
thread-id-indexed global for re_cache.  That solution then forces 
OpenMP onto anyone wanting to use the complex class in multiple 
threads.  Hence, I view this as a non-solution, too.
Again, my initial thought was 'heap allocation is the problem here, so 
I'll use the stack allocated backend'.  But variable precision doesn't 
appear to work with allocate_stack.  And statics are no good.
Any thoughts?
Yes, but none you may like.

* You could use Boost.Thread for thread-local statics.
* In C++11 you can use thread_local storage, see 
http://en.cppreference.com/w/cpp/language/storage_duration
* For pre-C++11 you could use __thread or __declspec(thread) in a 
non-portable compiler-specific way.
* If this was C99 you could use a variable-length-array and initialize a 
temporary mpfr_t yourself.
* Ditto, but using alloca (this does work in some C++ implementations 
but not all).
* You could use a temporary buffer big enough for the largest precision 
you ever use, and use that to initialize an mpfr_t yourself.  Of course 
this may run you out of stack space ;)

And finally...  you may not see as much speedup as you expect unless the 
precision is low - for any significant number of digits the cost of the 
arithmetic typically far outweighs everything else.

HTH, John.

Re: [Boost-users] mpfr_float stack allocation

John Maddock