[multiprecision] Some Performance gains

2 Aug 2012

      Following the review of the multiprecision lib, I've been experimenting with 
some of the issues raised, currently focusing on performance improvements.

Anyhow, I now have some results for improved *front end* performance, I 
haven't tried to seriously optimize the cpp_int backend yet (which was one 
of the criticisms), but for more on that see below.

So... by way of measuring front end performance - the "abstraction penalty" 
if you like I've used a special backend type "arithmetic_backend<A>" where A 
can be any builtin arithmetic type.  For tests I've used Boost.Math's 
special function test data for floating point types, and Phil Endecott's 
Delaunay code for integer performance.

Win32 VC10 first:

Delaunay test:
int64_t                                                                  : 
9.45241
mp_number<arithmetic_backend<int64_t>, false> : 9.66815

Bessel Functions Test:
Time for Bessel Functions - double = 0.015859 seconds
Time for Bessel Functions - arithmetic_backend<double> - no expression 
templates = 0.0225565 seconds
Time for Bessel Functions - arithmetic_backend<double> = 0.0376449 seconds

Non Central T:
Time for Non-central T - double = 0.456739 seconds
Time for Non-central T - arithmetic_backend<double> - no expression 
templates = 0.459957 seconds
Time for Non-central T - arithmetic_backend<double> = 1.08256 seconds

Ubutunu Linux x64, GCC-4.6.1:

Delaunay Test:
int64_t 2.3969
mp_number<arithmetic_backend<int64_t>, false> 2.38334

Bessel Functions Test:
Time for Bessel Functions - double = 0.0224617 seconds
Time for Bessel Functions - arithmetic_backend<double> - no expression 
templates = 0.020384 seconds
Time for Bessel Functions - arithmetic_backend<double> = 0.0220476 seconds

Non central T:
Time for Non-central T - double = 0.622985 seconds
Time for Non-central T - arithmetic_backend<double> - no expression 
templates = 0.554527 seconds
Time for Non-central T - arithmetic_backend<double> = 0.800129 seconds

So to all intents and purposes there's now no abstraction penalty at all, 
except when using expression templates (which should be reserved for 
heavyweight backends anyway).

Notice how the mp_number code is sometimes slightly faster, and sometimes 
slightly slower than the native type - I've observed that completely trivial 
changes to the test/measuring code can change this around - I suspect, but 
can't prove that the abstraction layer changes the decisions made by the 
compiler's optimizer resulting in somewhat capricious differences between 
results.

With regard to the cpp_int backend, I still have work to do, however things 
do actually look pretty good on Linux x64 for the Delaunay test:

Running calculations for: int64_t, int128_t (this is Phil's 128-bit 
special-case code)
Number of flips = 23892000
Total execution time = 3.28337

Running calculations for: int64_t, mp_int128_t
Number of flips = 23892000
Total execution time = 3.05434

Running calculations for: int64_t, mp_int128_t (ET)
Number of flips = 23892000
Time per calculation = 3.02904e-08

Running calculations for: int64_t, cpp_int
Number of flips = 23892000
Total execution time = 5.41737

On Win32, I'm still a touch behind Phil's code, however, I haven't written a 
dedicated NxN->2N multiplication routine (yet):

Running calculations for: int64_t, int128_t
Number of flips = 23892000
Total execution time = 14.2189

Running calculations for: int64_t, mp_int128_t
Number of flips = 23892000
Total execution time = 24.5114

Running calculations for: int64_t, mp_int128_t (ET)
Number of flips = 23892000
Total execution time = 19.741

Running calculations for: int64_t, cpp_int
Number of flips = 23892000
Total execution time = 33.5

All the new code is in the sandbox, but I haven't documented any of the 
changes yet...

Regards, John.

John Maddock

Christopher Kormanyos

John Maddock

Robert Ramey

Christopher Kormanyos

tags

participants (3)