[math][special_functions] Why is cbrt(x) slower than pow(x, 1/3)?

Hi!
For a (cell) simulation code I have to calculate lots of cubic roots in every
timestep. So I tried to improve performance on my first guess, namely pow(x,
1/3) using boost::math::cbrt(). To my astonishment, this is much slower than
the original code. I wrote a small test program to check this claim; compiling
with g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1 (-O3) on a Intel Core i7 CPU I get the
following timings (averaging the time over 10 trials):
average time to compute 5000000 roots with pow(): 0.603 s.
average time to compute 5000000 roots with boost::cbrt(): 1.087 s.
average time to compute 5000000 roots with improved boost::cbrt(): 1.015 s.
average time to compute 5000000 roots with exp(1/3*log()): 0.541 s.
My "improved" version allows giving the boost::math::cbrt() method a first
guess instead of taking just the value of the number whose root is to be
calculated.
Actually, the function I called in this instance 5000000 times does a bit
more than just calculating the root; I want to calculate the radii of to
spheres which overlap so, that their combined volume is equal to the volume of
a "mother sphere":
/**
* @brief calculates the radius of a daughter cell during symmetric division,
* assuming the volume of the mother cell is conserved.
*
* @param MotherRadius radius of mother cell before division
* @param Overlap of the two daughter cells, i.e. radius of daughter -
* distance from center of daughter to middle-plane of
* both cells
*
* @return for positive overlap returns positive radius, for negative overlap
* negative radius
**/
double daughterRadiusfOver( double MotherRadius, double Overlap) {
// r = 1/2 (2 sqrt(R^6-h^3 R^3)-h^3+2 R^3)^(1/3)+h^2/(2 (2 sqrt(R^6-h^3 R^3)-
h^3+2 R^3)^(1/3))
double O3 = Overlap*Overlap*Overlap;
double M3 = MotherRadius*MotherRadius*MotherRadius;
double step1 = pow(-O3 + 2*sqrt(-O3*M3 + M3*M3) + 2*M3,1.0/3.0);//<-- CUBIC
ROOT HERE!!! Change to:
// double step1 = boost::math::cbrt(-O3 + 2*sqrt(-O3*M3 + M3*M3) + 2*M3);
if ( Overlap < 0 ){
cerr << "Warning in daughterRadiusfOver: Negative overlap given!" <

Are you getting the same results from the different tests (the actual
results) ?
As far as i can see from the documentation, the boost function "math::cbrt"
allows you to define the precision used.
In the generel case, precision used can really impact performance.
Maybe look into what precision the other methods are using if its not the
same.
Kind regards and please post any findings.
On Wed, Jan 13, 2010 at 11:36 AM, Tim Odenthal wrote: Hi! For a (cell) simulation code I have to calculate lots of cubic roots in
every
timestep. So I tried to improve performance on my first guess, namely
pow(x,
1/3) using boost::math::cbrt(). To my astonishment, this is much slower
than
the original code. I wrote a small test program to check this claim;
compiling
with g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1 (-O3) on a Intel Core i7 CPU I get
the
following timings (averaging the time over 10 trials):
average time to compute 5000000 roots with pow(): 0.603 s.
average time to compute 5000000 roots with boost::cbrt(): 1.087 s.
average time to compute 5000000 roots with improved boost::cbrt(): 1.015 s.
average time to compute 5000000 roots with exp(1/3*log()): 0.541 s. My "improved" version allows giving the boost::math::cbrt() method a first
guess instead of taking just the value of the number whose root is to be
calculated. Actually, the function I called in this instance 5000000 times does a bit
more than just calculating the root; I want to calculate the radii of to
spheres which overlap so, that their combined volume is equal to the volume
of
a "mother sphere": /**
* @brief calculates the radius of a daughter cell during symmetric
division,
* assuming the volume of the mother cell is
conserved.
*
* @param MotherRadius radius of mother cell before division
* @param Overlap of the two daughter cells, i.e. radius of daughter -
* distance from center of daughter to middle-plane of
* both cells
*
* @return for positive overlap returns positive radius, for negative
overlap
* negative radius
**/
double daughterRadiusfOver( double MotherRadius, double Overlap) {
// r = 1/2 (2 sqrt(R^6-h^3 R^3)-h^3+2 R^3)^(1/3)+h^2/(2 (2 sqrt(R^6-h^3
R^3)-
h^3+2 R^3)^(1/3))
double O3 = Overlap*Overlap*Overlap;
double M3 = MotherRadius*MotherRadius*MotherRadius; double step1 = pow(-O3 + 2*sqrt(-O3*M3 + M3*M3) +
2*M3,1.0/3.0);//<-- CUBIC
ROOT HERE!!! Change to:
// double step1 = boost::math::cbrt(-O3 + 2*sqrt(-O3*M3 + M3*M3) +
2*M3); if ( Overlap < 0 ){
cerr << "Warning in daughterRadiusfOver: Negative overlap
given!" < Why is boost::math::cbrt() no improvement for me - or am I using it the
wrong
way? For which cases might it be an improvement, or why is it in the
library?
It's funny, but I found that std::exp(1/3*std::log(x)) is so far the
fastest
way... Thanks in advance
Tim
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

For a (cell) simulation code I have to calculate lots of cubic roots in every timestep. So I tried to improve performance on my first guess, namely pow(x, 1/3) using boost::math::cbrt(). To my astonishment, this is much slower than the original code. I wrote a small test program to check this claim; compiling with g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1 (-O3) on a Intel Core i7 CPU I get the following timings (averaging the time over 10 trials): average time to compute 5000000 roots with pow(): 0.603 s. average time to compute 5000000 roots with boost::cbrt(): 1.087 s. average time to compute 5000000 roots with improved boost::cbrt(): 1.015 s. average time to compute 5000000 roots with exp(1/3*log()): 0.541 s.
I'll investigate that - I admit I haven't profiled that function up till now - and I suspect you're on the right lines by suspecting that the current implementation may be slow in finding the initial starting guess. I suspect that some of the std:: functions may be faster if they're implemented as intrinsics, BTW have you tried ::cbrt in math.h ? Cheers, John.

For a (cell) simulation code I have to calculate lots of cubic roots in every timestep. So I tried to improve performance on my first guess, namely pow(x, 1/3) using boost::math::cbrt(). To my astonishment, this is much slower than the original code. I wrote a small test program to check this claim; compiling with g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1 (-O3) on a Intel Core i7 CPU I get the following timings (averaging the time over 10 trials): average time to compute 5000000 roots with pow(): 0.603 s. average time to compute 5000000 roots with boost::cbrt(): 1.087 s. average time to compute 5000000 roots with improved boost::cbrt(): 1.015 s. average time to compute 5000000 roots with exp(1/3*log()): 0.541 s.
FYI SVN Trunk now has an updated algorithm that is very competitive - within 1-2% of ::cbrt (gcc-4.4.1 on Ubuntu Linux): Testing cbrt 1.025e-07 Testing cbrt-c99 1.001e-07 Testing cbrt-pow 1.611e-07 Unfortunately msvc performance compares less well (even though it's much better than before, and does at least outperform the cephes lib): Testing cbrt 1.970e-007 Testing cbrt-cephes 2.676e-007 Testing cbrt-pow 1.072e-007 This reflects the poor performance of std::frexp on that compiler... annoying that :-( Regards, John.

On Sun, Jan 17, 2010 at 10:36 AM, John Maddock
For a (cell) simulation code I have to calculate lots of cubic roots in every timestep. So I tried to improve performance on my first guess, namely pow(x, 1/3) using boost::math::cbrt(). To my astonishment, this is much slower than the original code. I wrote a small test program to check this claim; compiling with g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1 (-O3) on a Intel Core i7 CPU I get the following timings (averaging the time over 10 trials): average time to compute 5000000 roots with pow(): 0.603 s. average time to compute 5000000 roots with boost::cbrt(): 1.087 s. average time to compute 5000000 roots with improved boost::cbrt(): 1.015 s. average time to compute 5000000 roots with exp(1/3*log()): 0.541 s.
FYI SVN Trunk now has an updated algorithm that is very competitive - within 1-2% of ::cbrt (gcc-4.4.1 on Ubuntu Linux):
Testing cbrt 1.025e-07 Testing cbrt-c99 1.001e-07 Testing cbrt-pow 1.611e-07
Unfortunately msvc performance compares less well (even though it's much better than before, and does at least outperform the cephes lib):
Testing cbrt 1.970e-007 Testing cbrt-cephes 2.676e-007 Testing cbrt-pow 1.072e-007
This reflects the poor performance of std::frexp on that compiler... annoying that :-(
As I recall, MSVC saved and restores the floating point rounding type on most floating point calls like that, which causes its speed hit, you can work around it by making your own in assembly or use SSE or so, as I recall that is...

As I recall, MSVC saved and restores the floating point rounding type on most floating point calls like that, which causes its speed hit, you can work around it by making your own in assembly or use SSE or so, as I recall that is...
Nod... we even have most of the tools required in Johan Rades floating point utils, but there are more important fist to fry for now... John.

John Maddock wrote:
As I recall, MSVC saved and restores the floating point rounding type on most floating point calls like that, which causes its speed hit, you can work around it by making your own in assembly or use SSE or so, as I recall that is...
Nod... we even have most of the tools required in Johan Rades floating point utils, but there are more important fist to fry for now...
John.
I may have such an implementation in NT2 that we may share
participants (5)
-
Joel Falcou
-
John Maddock
-
OvermindDL1
-
Rune Lund Olesen
-
Tim Odenthal