mpfr_float stack allocation

Hi Boost Users,
I use Boost.Multiprecision's wrappers for MPFR, particularly the
boost::multiprecision::mpfr_float. I've had good success with this library
and its types so far, enjoying performance gains because of expression
templates, and easy access to mpfr and gmp for floats, ints, and
rationals. However, in my complex number type, I am struggling with named
real temporaries in arithmetic operators. I am looking for help with
multiplication, and I can generalize to all other operations needing a real
temporary later. Here's my current code for multiplication, implemented in
terms of *=.
/**
Complex multiplication. uses a single temporary variable
1 temporary, 4 multiplications
*/
complex& operator*=(const complex & rhs)
{
mpfr_float re_cache(real_*rhs.real_ - imag_*rhs.imag_); // cache the real
part of the result
imag_ = real_*rhs.imag_ + imag_*rhs.real_;
real_ = re_cache;
return *this;
}
I am using the naive evaluation formula, for better or worse. But the
particular method of evaluation doesn't matter to me right now, it's the
temporary variable re_cache used to cache the real part is killing me.
Profiling using Allinea, gprof, and Callgrind reveals that a significant
percentage of time in the program is consumed creating and destroying
re_cache, as well as other temporaries scattered throughout my code.
In an effort to eliminate heap allocations, I first tried replacing my using
mpfr_float
= boost::multiprecision::number

Shouldn't you be using et_on based on your comments above? In any case you've found a bug, using mpfr_float_backend<0,boost::multiprecision::allocate_stack> creates a one-bit float, which I suspect is not what you wanted ;) I'll fix that so that it causes a compiler error.
Yes, but none you may like. * You could use Boost.Thread for thread-local statics. * In C++11 you can use thread_local storage, see http://en.cppreference.com/w/cpp/language/storage_duration * For pre-C++11 you could use __thread or __declspec(thread) in a non-portable compiler-specific way. * If this was C99 you could use a variable-length-array and initialize a temporary mpfr_t yourself. * Ditto, but using alloca (this does work in some C++ implementations but not all). * You could use a temporary buffer big enough for the largest precision you ever use, and use that to initialize an mpfr_t yourself. Of course this may run you out of stack space ;) And finally... you may not see as much speedup as you expect unless the precision is low - for any significant number of digits the cost of the arithmetic typically far outweighs everything else. HTH, John.
participants (2)
-
Daniel Brake
-
John Maddock