Looking for some "real world" extended precision integer arithmetic tests

Folks, I'm continuing to add to the multiprecision arithmetic library in the sandbox (under "big_number"), and I've just added a Boost licensed fixed-precision integer type which on Win32 at least looks to be very competitive with GMP for reasonable numbers of bits (up to about 1024), see http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht... However, as we all know there are lies damn lies and performance stats ;-) Plus the test results I have above ignore the effect of memory allocations (needed by libtommath and GMP, but not the fixed_int code - which in theory should make it faster still). So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost? Thanks in advance, John.

On Jan 24, 2012, at 10:38 AM, John Maddock <boost.regex@virgin.net> wrote:
Folks,
I'm continuing to add to the multiprecision arithmetic library in the sandbox (under "big_number"), and I've just added a Boost licensed fixed-precision integer type which on Win32 at least looks to be very competitive with GMP for reasonable numbers of bits (up to about 1024), see http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht...
However, as we all know there are lies damn lies and performance stats ;-) Plus the test results I have above ignore the effect of memory allocations (needed by libtommath and GMP, but not the fixed_int code - which in theory should make it faster still). So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
The first thing I could think of is primality testing! Im not sure if it's doable with little effort, but perhaps it *is*. Currently the fastest deterministic algorithm ( big O) is described in the paper below which describes an implementation of the algorithm. They ran the following performance test: "Testing was conducted on the linux platform using kernel version 2.6.8 on a Pentium 4-M processor running at 1.8 GHz. The C++ compiler was GNU g++ version 3.3.5. The kernel, system libraries, and AKS implementation were all compiled with i686 optimizations enabled. Version 2.1.3 of LiDIA was used. NTL version 5.2 was used. Both LiDIA and NTL were compiled with support for GMP large integers." http://teal.gmu.edu/courses/ECE746/project/F06_Project_resources/Salembier_S... Cheers, Thijs

I'm continuing to add to the multiprecision arithmetic library in the sandbox (under "big_number"), and I've just added a Boost licensed fixed-precision integer type which on Win32 at least looks to be very competitive with GMP for reasonable numbers of bits (up to about 1024), see http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht...
However, as we all know there are lies damn lies and performance stats ;-) Plus the test results I have above ignore the effect of memory allocations (needed by libtommath and GMP, but not the fixed_int code - which in theory should make it faster still). So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
The first thing I could think of is primality testing! Im not sure if it's doable with little effort, but perhaps it *is*. Currently the fastest deterministic algorithm ( big O) is described in the paper below which describes an implementation of the algorithm. They ran the following performance test:
"Testing was conducted on the linux platform using kernel version 2.6.8 on a Pentium 4-M processor running at 1.8 GHz. The C++ compiler was GNU g++ version 3.3.5. The kernel, system libraries, and AKS implementation were all compiled with i686 optimizations enabled. Version 2.1.3 of LiDIA was used. NTL version 5.2 was used. Both LiDIA and NTL were compiled with support for GMP large integers."
http://teal.gmu.edu/courses/ECE746/project/F06_Project_resources/Salembier_S...
Interesting, but looks... complicated, I was hoping for something I could just plug straight into, Cheers, John.

I'm continuing to add to the multiprecision arithmetic library in the sandbox (under "big_number"), and I've just added a Boost licensed fixed-precision integer type which on Win32 at least looks to be very competitive with GMP for reasonable numbers of bits (up to about 1024), see http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht...
Wow, those are some big numbers you are putting up. You must be stoked.
However, as we all know there are lies damn lies and performance stats ;-) Plus the test results I have above ignore the effect of memory allocations (needed by libtommath and GMP, but not the fixed_int code - which in theory should make it faster still). So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
In my experience memory allocations dominate gmp performance for modest sized big numbers. My library uses infinite precision rational data type. I would love to see your library interoperate with boost rational. However, because I use lazy exact you would not be able to observe much effect of using a different numerical data type with my library. The voronoi diagram feature being implemented in my library by Andrii as part of GSOC2010 implements its own extended precision arithmetic (float and int) and has performance that is more heavily dependent on the numerical data type because for line segements, at least, it is impossible to avoid extended precision in the common case. You should work with Andrii to both make sure your library works well for his needs and collect performance results with his tests. Regards, Luke

I'm continuing to add to the multiprecision arithmetic library in the sandbox (under "big_number"), and I've just added a Boost licensed fixed-precision integer type which on Win32 at least looks to be very competitive with GMP for reasonable numbers of bits (up to about 1024), see http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht...
Wow, those are some big numbers you are putting up. You must be stoked.
However, as we all know there are lies damn lies and performance stats ;-) Plus the test results I have above ignore the effect of memory allocations (needed by libtommath and GMP, but not the fixed_int code - which in theory should make it faster still). So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
In my experience memory allocations dominate gmp performance for modest sized big numbers.
My library uses infinite precision rational data type. I would love to see your library interoperate with boost rational. However, because I use lazy exact you would not be able to observe much effect of using a different numerical data type with my library.
The number types I have can all be plugged straight into Boost.Rational. However, performance compared to say mpq_t is truly terrible. Not sure if it's Boost.Rational or Boost's gcd that's the bottleneck, though I suspect both could be improved.
The voronoi diagram feature being implemented in my library by Andrii as part of GSOC2010 implements its own extended precision arithmetic (float and int) and has performance that is more heavily dependent on the numerical data type because for line segements, at least, it is impossible to avoid extended precision in the common case. You should work with Andrii to both make sure your library works well for his needs and collect performance results with his tests.
Sure, where do I find him? Thanks, John.

Hi John, I was following development process of your library and was just waiting this moment. I'll test your fixed_int code and compare it to my own implementation. FYI gmp library performance was 4 times slower comparing to my fixed_int implementation. I was also wondering if it's possible to provide generalized interface for the Boost licensed big_number and specifying underlying data storage using template parameter (boost::array for fixed integers or vector for not fixed integers). Most of the operations implementations (addition, multiplication and all the others) should remain the same. You may want to have a look at this code: http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_s... http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_s... http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_d... I would suggest that this way you should get not fixed int implementation performing almost the same as fixed_int, and as a plus you would not need tommath_int anymore. Thanks, Andrii On Tue, Jan 24, 2012 at 6:50 PM, John Maddock <boost.regex@virgin.net>wrote:
I'm continuing to add to the multiprecision arithmetic library in the
sandbox (under "big_number"), and I've just added a Boost licensed fixed-precision integer type which on Win32 at least looks to be very competitive with GMP for reasonable numbers of bits (up to about 1024), see http://svn.boost.org/svn/**boost/sandbox/big_number/libs/** multiprecision/doc/html/boost_**multiprecision/perf/integer_** performance.html<http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/html/boost_multiprecision/perf/integer_performance.html>
Wow, those are some big numbers you are putting up. You must be stoked.
However, as we all know there are lies damn lies and performance stats
;-) Plus the test results I have above ignore the effect of memory allocations (needed by libtommath and GMP, but not the fixed_int code - which in theory should make it faster still). So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
In my experience memory allocations dominate gmp performance for modest sized big numbers.
My library uses infinite precision rational data type. I would love to see your library interoperate with boost rational. However, because I use lazy exact you would not be able to observe much effect of using a different numerical data type with my library.
The number types I have can all be plugged straight into Boost.Rational. However, performance compared to say mpq_t is truly terrible. Not sure if it's Boost.Rational or Boost's gcd that's the bottleneck, though I suspect both could be improved.
The voronoi diagram feature being implemented in my library by Andrii as
part of GSOC2010 implements its own extended precision arithmetic (float and int) and has performance that is more heavily dependent on the numerical data type because for line segements, at least, it is impossible to avoid extended precision in the common case. You should work with Andrii to both make sure your library works well for his needs and collect performance results with his tests.
Sure, where do I find him?
Thanks, John.
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

I was following development process of your library and was just waiting this moment. I'll test your fixed_int code and compare it to my own implementation. FYI gmp library performance was 4 times slower comparing to my fixed_int implementation.
I was also wondering if it's possible to provide generalized interface for the Boost licensed big_number and specifying underlying data storage using template parameter (boost::array for fixed integers or vector for not fixed integers). Most of the operations implementations (addition, multiplication and all the others) should remain the same. You may want to have a look at this code: http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_s... http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_s... http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_d... I would suggest that this way you should get not fixed int implementation performing almost the same as fixed_int, and as a plus you would not need tommath_int anymore.
It's an interesting idea - but at present I'm thinking that fixed_int is specialized enough to warrant it's own implementation - it's 2's complement where as almost all (all?) arbitrary precision types are signed-magnitude, there are also some operations that can be significantly streamlined for fixed precision 2's complement modulo arithmetic types (for example long multiplication doesn't need to calculate all the places in the result) which is the reason many operations can outperform GMP *even excluding memory allocations*. There's also the interesting effect, that since the array bounds are known at compile time, the compiler can unroll and / or vectorize the loops - and it can probably do that almost as well as hand written assembly. That said, since all the algorithms are non-member functions anyway, there may well end up being some possibility of code sharing further down the road... Cheers, John.

John Maddock wrote:
The number types I have can all be plugged straight into Boost.Rational. However, performance compared to say mpq_t is truly terrible. Not sure if it's Boost.Rational or Boost's gcd that's the bottleneck, though I suspect both could be improved.
Because I use lazy exact this may not be that big of a concern. It is possible for an entire polygon clipping operation to run through and never once actually call the rational arithmetic. Using your bigint with boost rational is a much better option than simply using the result provided by long double even after my check for whether it satisfies my invariant has failed. I would be happy to add dependency on both boost libraries if it meant that by default my library was numerically robust without using gmp. It sounds like we may need to rework Rational, however, to make it more useful for extended precision use cases. Perhaps you can specialize Rational's templates for your types in your library. You could then provide your expression templates as their return types. Regards, Luke

John, Benchmark description follows (test code and results in the attachments): History: Voronoi diagram predicates use 256bit signed int type in case of voronoi of points and 4096bit signed int in case of voronoi of segments. This benchmark was run only for the point inputs. Input coordinates for the voronoi algorithm are from int32 domain. Fixed_int type was used only to compute center's of the inscribed circles and was not used to store any algorithm intermediate values. Voronoi fixed_int is not a complete integer type (it doesn't provide all the integer operations, just those required by the voronoi algorithm). Preconditions: Before running benchmark I made sure that output generated with your fixed_int and with mine implementation is completely the same over a set of 30000 random input data sets. Bencmark results (time in seconds per test): *Using voronoi extended_int256 type*: | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000038 | | 100 | 1000 | 0.000621 | | 1000 | 100 | 0.006990 | | 10000 | 10 | 0.073000 | | 100000 | 1 | 0.783000 | *Using voronoi extended_int4096 type*: | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000038 | | 100 | 1000 | 0.000621 | | 1000 | 100 | 0.007130 | | 10000 | 10 | 0.074500 | | 100000 | 1 | 0.795000 | *Using multiprecision type mp_int256_t:* | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000098 | | 100 | 1000 | 0.001720 | | 1000 | 100 | 0.019230 | | 10000 | 10 | 0.199800 | | 100000 | 1 | 2.045000 | *Using multiprecision type mp_int512_t:* | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000238 | | 100 | 1000 | 0.004299 | | 1000 | 100 | 0.047770 | | 10000 | 10 | 0.491500 | | 100000 | 1 | 5.023000 | *Using multiprecision type mp_int1024_t:* | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000686 | | 100 | 1000 | 0.012558 | | 1000 | 100 | 0.139440 | | 10000 | 10 | 1.428000 | | 100000 | 1 | 14.373000 | Conclusions: 0) The main reason vornoi fixed_int outperforms multiprecision fixed_int is that it doesn't operate with leading bytes if those are not set (correct me if I'm wrong). 1) Performance of the voronoi fixed_int type is almost independent on the memory allocated for it. 2) Performance of the multiprecision fixed_int type strongly correlates with the memory allocated for it. 3) Voronoi fixed_int 256-bit outperforms multiprecision fixed_int 256-bit 2.5 times. Actually this number is even bigger as it doesn't substract time used by the other parts of the algorithm. Regards, Andrii

Andrii Sydorchuk wrote:
The main reason vornoi fixed_int outperforms multiprecision fixed_int is that it doesn't operate with leading bytes if those are not set.
Interesting, do you store the length of your voronoi fixed_int separate from the array? I see no other way to implement the optimization. It isn't a completely obvious thing to do. From your wording it sounds like you may even leave unused bytes in an uninitialized state. Is that the case? Presumably one would set the leading bytes to zero for positive numbers and FF for negative numbers for 2's complement representation. Arithmetic becomes a little more complex as when we run out of bytes in one array we must continue working with just the other until carry resolves itself, but then have the option of early termination and copying over the rest to the result. The common case we should expect is two small numbers or one small and one large number. Cool, Luke

Interesting, do you store the length of your voronoi fixed_int separate from the array?
Yes, I do. Even more I also store the sign of the fixed_int in the same value (with the size). This means that voronoi fixed_int doesn't use 2's complement representation. If leading bytes are not set we won't even operate with them (they might be uninitialized). As another small advantage we will always have opposite value for the smallest possible negative integer value representable with the given fixed_int. Andrii _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Interesting, do you store the length of your voronoi fixed_int separate from the array?
Yes, I do. Even more I also store the sign of the fixed_int in the same value (with the size). This means that voronoi fixed_int doesn't use 2's complement representation. If leading bytes are not set we won't even operate with them (they might be uninitialized). As another small advantage we will always have opposite value for the smallest possible negative integer value representable with the given fixed_int.
Interesting! I'm going to have to think carefully about that - the whole Raison d'être for my fixed_int was to emulate a regular 2's complement integer type as closely as possible. What you have is a variable precision type using fixed storage? Lot's to think about, thanks, John.

John,
Benchmark description follows (test code and results in the attachments):
History: Voronoi diagram predicates use 256bit signed int type in case of voronoi of points and 4096bit signed int in case of voronoi of segments. This benchmark was run only for the point inputs. Input coordinates for the voronoi algorithm are from int32 domain. Fixed_int type was used only to compute center's of the inscribed circles and was not used to store any algorithm intermediate values. Voronoi fixed_int is not a complete integer type (it doesn't provide all the integer operations, just those required by the voronoi algorithm).
Preconditions: Before running benchmark I made sure that output generated with your fixed_int and with mine implementation is completely the same over a set of 30000 random input data sets.
Bencmark results (time in seconds per test):
*Using voronoi extended_int256 type*: | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000038 | | 100 | 1000 | 0.000621 | | 1000 | 100 | 0.006990 | | 10000 | 10 | 0.073000 | | 100000 | 1 | 0.783000 |
*Using voronoi extended_int4096 type*: | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000038 | | 100 | 1000 | 0.000621 | | 1000 | 100 | 0.007130 | | 10000 | 10 | 0.074500 | | 100000 | 1 | 0.795000 |
*Using multiprecision type mp_int256_t:* | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000098 | | 100 | 1000 | 0.001720 | | 1000 | 100 | 0.019230 | | 10000 | 10 | 0.199800 | | 100000 | 1 | 2.045000 |
*Using multiprecision type mp_int512_t:* | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000238 | | 100 | 1000 | 0.004299 | | 1000 | 100 | 0.047770 | | 10000 | 10 | 0.491500 | | 100000 | 1 | 5.023000 |
*Using multiprecision type mp_int1024_t:* | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000686 | | 100 | 1000 | 0.012558 | | 1000 | 100 | 0.139440 | | 10000 | 10 | 1.428000 | | 100000 | 1 | 14.373000 |
Conclusions: 0) The main reason vornoi fixed_int outperforms multiprecision fixed_int is that it doesn't operate with leading bytes if those are not set (correct me if I'm wrong). 1) Performance of the voronoi fixed_int type is almost independent on the memory allocated for it. 2) Performance of the multiprecision fixed_int type strongly correlates with the memory allocated for it. 3) Voronoi fixed_int 256-bit outperforms multiprecision fixed_int 256-bit 2.5 times. Actually this number is even bigger as it doesn't substract time used by the other parts of the algorithm.
Well that's disappointing then ! :-( Certainly looks like lazy evaluation of higher-order limbs is a big win in this case - I'm guessing that most operations use only one or two limbs, and just every now and then you get a true big number? Thanks, John.

What you have is a variable precision type using fixed storage?
Yes you are right. However the functionality it provides is the same as with 2's complement backend. The reasons 2's complement representation is used for integral types (int32, int64) are: 1) it requires less storage, as doesn't use additional value to handle number of bits set. 2) it allows to execute addition and subtraction quickly on CPU (without considering different sign cases). For higher order fixed_int this speedup won't be notable, as you spend more time on adding limbs and doing shifts. Certainly looks like lazy evaluation of higher-order limbs is a big win in
this case - I'm guessing that most operations use only one or two limbs, and just every now and then you get a true big number?
Correct. That's why I keep insisting on the implementation from here http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint<http://svn.boost.org/svn/boost/sandbox/SOC/2007/bigint/boost/bigint/bigint_default.hpp>. Wrapping it with a multiprecision interface will make it a very good bigint library I believe. However before doing that we should also investigate concerns around that library (I'll try to contact the author). Thanks, Andrii

John,
Benchmark description follows (test code and results in the attachments):
Andrii, I finally got around to running your benchmark myself and see rather different results: boost::polygon::detail::extended_int<128> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000053 | | 100 | 1000 | 0.000861 | | 1000 | 100 | 0.009630 | | 10000 | 10 | 0.104400 | | 100000 | 1 | 1.064000 | mp_number<fixed_int<128, true> > Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000062 | | 100 | 1000 | 0.000874 | | 1000 | 100 | 0.009530 | | 10000 | 10 | 0.105400 | | 100000 | 1 | 1.089000 | mp_number<fixed_int<256, true> > Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000054 | | 100 | 1000 | 0.000868 | | 1000 | 100 | 0.009470 | | 10000 | 10 | 0.105200 | | 100000 | 1 | 1.075000 | mp_number<fixed_int<512, true> > Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000056 | | 100 | 1000 | 0.000865 | | 1000 | 100 | 0.009500 | | 10000 | 10 | 0.105500 | | 100000 | 1 | 1.071000 | mp_number<fixed_int<1024, true> > Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000069 | | 100 | 1000 | 0.001027 | | 1000 | 100 | 0.011230 | | 10000 | 10 | 0.122700 | | 100000 | 1 | 1.250000 | mp_number<cpp_int_backend<> > Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000055 | | 100 | 1000 | 0.000866 | | 1000 | 100 | 0.009630 | | 10000 | 10 | 0.107000 | | 100000 | 1 | 1.077000 | mp_number<mpz_int > Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000064 | | 100 | 1000 | 0.001076 | | 1000 | 100 | 0.013260 | | 10000 | 10 | 0.139300 | | 100000 | 1 | 1.375000 | mp_number<tommath_int> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000063 | | 100 | 1000 | 0.000897 | | 1000 | 100 | 0.009770 | | 10000 | 10 | 0.108800 | | 100000 | 1 | 1.097000 | So basically the times are all about the same, except for very large fixed_int's, or gmp - and even those aren't *much* worse. There's also an experimental "cpp_int" in there which is an all C++ arbitrary precision type that uses the "small value optimization" to avoid memory allocation when possible. This is with the latest gtl sandbox code BTW. Maybe you've changed something to make the code much less dependent on the integer type? I assume that the only code I need to change to test a new type is the line that defines big_int_type in voronoi_user_traits? BTW I tried to run the test on Linux as well, but I couldn't get your code to compile there.... Cheers, John.

Hi John BTW I tried to run the test on Linux as well, but I couldn't get your code
to compile there....
I recently migrated all the development to the 64bit Linux + gcc 4.6.1 and fixed all the errors and warnings. So if you sync it should be fine. Also the previous voronoi_fixed_int_test.cpp* *won't compile so please use the new one (in the attachments). This is with the latest gtl sandbox code BTW. Maybe you've changed
something to make the code much less dependent on the integer type?
The reason why you get such close results is because lazy arithmetic kernel is enabled by default. That means that in 95% of computations fixed_int is not used at all. The configuration of the predicate kernel is not exposed to the user. So to disable lazy arithmetic kernel you should change it manually: file boost/polygon/detail/voronoi_predicates.hpp change "typename CFF = lazy_circle_formation_functor<Site, Circle> >" to "typename CFF = mp_circle_formation_functor<Site, Circle> >". I assume that the only code I need to change to test a new type is the
line that defines big_int_type in voronoi_user_traits?
With the change above it's the only thing you need to change (+2 entries in the fpt_converters arguments above the voronoi_user_traits definition). mp_number<fixed_int<128, true> > Please don't try to run the algorithm with 128-bit fixed_int because it may cause algorithm to fail or produce incorrect results. The size of the fixed_int should be at least 256 bits. Cheers, Andrii. ______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

I recently migrated all the development to the 64bit Linux + gcc 4.6.1 and fixed all the errors and warnings. So if you sync it should be fine. Also the previous voronoi_fixed_int_test.cpp* *won't compile so please use the new one (in the attachments).
Hmmm, with the latest sandbox code I and your new test case I still see: 1> voronoi_fixed_int_test.cpp 1>m:\data\boost\sandbox\gtl\boost\polygon\detail\voronoi_robust_fpt.hpp(466): error C2784: 'boost::polygon::detail::robust_dif<T> boost::polygon::detail::operator /(const boost::polygon::detail::robust_dif<T> &,const T &)' : could not deduce template argument for 'const boost::polygon::detail::robust_dif<T> &' from 'double' 1> m:\data\boost\sandbox\gtl\boost\polygon\detail\voronoi_robust_fpt.hpp(431) : see declaration of 'boost::polygon::detail::operator /' 1> m:\data\boost\sandbox\gtl\boost\polygon\detail\voronoi_robust_fpt.hpp(460) : while compiling class template member function 'boost::polygon::detail::extended_exponent_fpt<_fpt> boost::polygon::detail::robust_sqrt_expr<_int,boost::polygon::detail::extended_exponent_fpt<_fpt>,_converter>::eval2(_int *,_int *)' 1> with 1> [ 1> _fpt=double, 1> _int=boost::multiprecision::mp_number<boost::multiprecision::cpp_int_backend<>>, 1> _converter=dummy_efpt_converter 1> ] 1> m:\data\boost\sandbox\gtl\boost\polygon\detail/voronoi_predicates.hpp(978) : see reference to class template instantiation 'boost::polygon::detail::robust_sqrt_expr<_int,_fpt,_converter>' being compiled 1> with 1> [ 1> _int=boost::multiprecision::mp_number<boost::multiprecision::cpp_int_backend<>>, 1> _fpt=boost::polygon::detail::extended_exponent_fpt<double>, 1> _converter=dummy_efpt_converter 1> ] 1> m:\data\boost\sandbox\gtl\boost\polygon\detail/voronoi_predicates.hpp(1373) : see reference to class template instantiation 'boost::polygon::detail::voronoi_predicates<CTYPE_TRAITS>::mp_circle_formation_functor<Site,Circle>' being compiled 1> with 1> [ 1> CTYPE_TRAITS=voronoi_user_traits, 1> Site=boost::polygon::detail::site_event<int>, 1> Circle=boost::polygon::detail::circle_event<double> 1> ] 1> M:\data\boost\sandbox\gtl\boost/polygon/voronoi_builder.hpp(526) : see reference to class template instantiation 'boost::polygon::detail::voronoi_predicates<CTYPE_TRAITS>::circle_formation_predicate<Site,Circle>' being compiled 1> with 1> [ 1> CTYPE_TRAITS=voronoi_user_traits, 1> Site=boost::polygon::detail::site_event<int>, 1> Circle=boost::polygon::detail::circle_event<double> 1> ] 1> ..\..\..\performance\voronoi_fixed_int_test.cpp(84) : see reference to class template instantiation 'boost::polygon::voronoi_builder<T,CTT>' being compiled 1> with 1> [ 1> T=int, 1> CTT=voronoi_user_traits 1> ] 1>m:\data\boost\sandbox\gtl\boost\polygon\detail\voronoi_robust_fpt.hpp(466): error C2677: binary '/' : no global operator found which takes type 'boost::polygon::detail::extended_exponent_fpt<_fpt>' (or there is no acceptable conversion) 1> with 1> [ 1> _fpt=double 1> ] ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ========== John.

Hmmm, with the latest sandbox code I and your new test case I still see:
I briefly looked through the compiler output and it looks like the errors you got are caused by the voronoi_user_traits struct definition. I am not able to compile the code right now so will update voronoi_fixed_int_test.cpp in the evening. Could you also attach your voronoi_fixed_int_test.cpp or is it the same I sent? Thanks, Andrii

I briefly looked through the compiler output and it looks like the errors you got are caused by the voronoi_user_traits struct definition. I am not able to compile the code right now so will update voronoi_fixed_int_test.cpp in the evening. Could you also attach your voronoi_fixed_int_test.cpp or is it the same I sent?
Same as you sent should reproduce, HTH, John.

John, The issue was with the voronoi_user_traits (gcc doesn't know about __int64). You can find updated file in the attachments. Cheers, Andrii ______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

Andrii, Many thanks for the use case - in point of fact it's proved pretty hard to get a general purpose solution anywhere near your special-purpose code - simply because your code is so simple ;-) In the general purpose case, even something as simple as an extra "if" statement can make a big difference in your benchmarks, largely because they mainly test construction of temporaries! Anyhow, I think I'm as close as it's going to get for now, here it is on Linux (sorry about the mangled names, but your integer type is first): Testing... N5boost7polygon6detail12extended_intILm32EEE Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000035 | | 100 | 1000 | 0.000620 | | 1000 | 100 | 0.006900 | | 10000 | 10 | 0.087000 | | 100000 | 1 | 0.880000 | Testing... N5boost14multiprecision9mp_numberINS0_8backends15cpp_int_backendILj256ELb1EvEELb1EEE Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000041 | | 100 | 1000 | 0.000720 | | 1000 | 100 | 0.008200 | | 10000 | 10 | 0.097000 | | 100000 | 1 | 1.010000 | Testing... N5boost14multiprecision9mp_numberINS0_8backends15cpp_int_backendILj1024ELb1EvEELb1EEE Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000041 | | 100 | 1000 | 0.000730 | | 1000 | 100 | 0.008100 | | 10000 | 10 | 0.097000 | | 100000 | 1 | 1.000000 | Testing... N5boost14multiprecision9mp_numberINS0_8backends15cpp_int_backendILj0ELb1ESaImEEELb1EEE Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000052 | | 100 | 1000 | 0.000930 | | 1000 | 100 | 0.010300 | | 10000 | 10 | 0.121000 | | 100000 | 1 | 1.240000 | Testing... N5boost14multiprecision9mp_numberINS0_8backends7gmp_intELb1EEE Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000075 | | 100 | 1000 | 0.001400 | | 1000 | 100 | 0.015500 | | 10000 | 10 | 0.174000 | | 100000 | 1 | 1.780000 | Testing... N5boost14multiprecision9mp_numberINS0_8backends11tommath_intELb1EEE Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.001447 | | 100 | 1000 | 0.026940 | | 1000 | 100 | 0.269500 | | 10000 | 10 | 2.590000 | | 100000 | 1 | 24.870000 | On Windows, I can't get quite as close, and MPIR on Win32 is noticeably much much worse than GMP on Linux x64: Testing... class boost::polygon::detail::extended_int<32> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000072 | | 100 | 1000 | 0.001117 | | 1000 | 100 | 0.012220 | | 10000 | 10 | 0.134600 | | 100000 | 1 | 1.377000 | Testing... class boost::multiprecision::mp_number<struct boost::multiprecision::backends::cpp_int_backend<1024,1,void>,0> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000094 | | 100 | 1000 | 0.001601 | | 1000 | 100 | 0.017600 | | 10000 | 10 | 0.187100 | | 100000 | 1 | 1.893000 | Testing... class boost::multiprecision::mp_number<struct boost::multiprecision::backends::cpp_int_backend<1024,1,void>,1> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000104 | | 100 | 1000 | 0.001693 | | 1000 | 100 | 0.018740 | | 10000 | 10 | 0.196900 | | 100000 | 1 | 1.988000 | Testing... class boost::multiprecision::mp_number<struct boost::multiprecision::backends::cpp_int_backend<0,1,class std::allocator<unsigned int> >,1> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000126 | | 100 | 1000 | 0.002210 | | 1000 | 100 | 0.024230 | | 10000 | 10 | 0.254400 | | 100000 | 1 | 2.563000 | Testing... class boost::multiprecision::mp_number<struct boost::multiprecision::backends::gmp_int,1> Voronoi Benchmark Test (time in seconds): | Number of points | Number of tests | Time per one test | | 10 | 10000 | 0.000392 | | 100 | 1000 | 0.007060 | | 1000 | 100 | 0.078010 | | 10000 | 10 | 0.803300 | | 100000 | 1 | 8.080000 | Cheers, John.

From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock
Many thanks for the use case - in point of fact it's proved pretty hard to get a general purpose solution anywhere near your special-purpose code - simply because your code is so simple ;-) In the general purpose case, even something as simple as an extra "if" statement can make a big difference in your benchmarks, largely because they mainly test construction of temporaries!
In my experience construction of temporaries tends to dominate computation for numerical data types in Polygon's line segment intersection algorithm as well. Usage of gmp can differ by 10X performance depending on whether temporaries are allocated every time or cached and recycled. I think what we are seeing in Andrii's use case may generalize across quite a few different big number use cases. Regards, Luke

John,
From your benchmark I see that you didn't provide results for boost::multiprecision::fixed_int, but rather for boost::multiprecision::** backends::cpp_int_backend. I tried to update boost multiprecision branch, but boost/mutliprecision/backends directory appeared to be empty.
Many thanks for the use case - in point of fact it's proved pretty hard to
get a general purpose solution anywhere near your special-purpose code - simply because your code is so simple
I wouldn't say so. The only simple thing about my extended_int implementation is that it implements 3 basic operations: addition, subtraction and multiplication. But I don't see any problems on extending it to support division and other arithmetic operations. My point is that for fixed integer class it's not a good idea to represent negative numbers as two's complement. The main point of my benchmark was to compare fixed int implementations. And those should not have much performance difference in construction of temporary variables. I think what we are seeing in Andrii's use case may generalize across
quite a few different big number use cases.
Exactly, the main point of usage of fixed big integer class is that it doesn't require you to think about temporaries creation (except overflowing stack, but that's another topic). I would say that for math computations it is the best option in 90% of cases. It would be good for Boost to have both big int implementations: with stack allocated backend and heap allocated backend. I was able to contact Arseniy Kapoulkine, who's big int implementation is under /sandbox/SOC/2007/bigint. Arseniy agreed to work with Boost group to solve any problems that prevent the standardization. I will try to come up with benchmarks of the current implementation based on the Voronoi code. It would be very nice to combine that implementation with boost multiprecision. Cheers, Andrii Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

From your benchmark I see that you didn't provide results for boost::multiprecision::fixed_int, but rather for boost::multiprecision::** backends::cpp_int_backend. I tried to update boost multiprecision branch, but boost/mutliprecision/backends directory appeared to be empty.
Many thanks for the use case - in point of fact it's proved pretty hard to
get a general purpose solution anywhere near your special-purpose code - simply because your code is so simple
I wouldn't say so. The only simple thing about my extended_int implementation is that it implements 3 basic operations: addition, subtraction and multiplication. But I don't see any problems on extending it to support division and other arithmetic operations. My point is that for fixed integer class it's not a good idea to represent negative numbers as two's complement. The main point of my benchmark was to compare fixed int implementations. And those should not have much performance difference in construction of temporary variables.
Ah, you probably misunderstood what I meant. Let me give you a concrete example: Your multiplication code is fast, probably as fast as it can be in your particular use case, the limb counts never grow too large, so there are actually very few integer ops executed for a multiply. Now it so happens that multiplying by an integer (or big number with a single limb) can be optimised with it's own simpler routine, that's *much* quicker (as in order of magnitude quicker) for larger limb counts. But.... hooking that code into cpp_int made your benchmark run slightly slower... the cause is the extra check in there to see if the optimisation can be taken! Basically, in most cases, your routines are called with small numbers that execute so few integer ops, that seemingly trivial changes can have quite detrimental effects on runtime. There are other similar examples, but basically my case is that to make the integer type general purpose there are actually lots of these special case checks, each of which are there for a very good reason, but which collectively easily account for that 10% slowdown in my code compared to yours. In short there's no free lunch, what's an optimisation for X is often a dis-optimisation for Y.
I think what we are seeing in Andrii's use case may generalize across
quite a few different big number use cases.
Exactly, the main point of usage of fixed big integer class is that it doesn't require you to think about temporaries creation (except overflowing stack, but that's another topic). I would say that for math computations it is the best option in 90% of cases.
It would be good for Boost to have both big int implementations: with stack allocated backend and heap allocated backend. I was able to contact Arseniy Kapoulkine, who's big int implementation is under /sandbox/SOC/2007/bigint. Arseniy agreed to work with Boost group to solve any problems that prevent the standardization. I will try to come up with benchmarks of the current implementation based on the Voronoi code. It would be very nice to combine that implementation with boost multiprecision.
Ah, I should have been clearer what I've done here: the old 2's complement fixed_int class has gone. There is now "one true" C++ big-int implementation that handles both arbitrary precision and fixed precision cases using signed-magnitude representation. Again, another compromise: sign magnitude is actually very slightly slower than the old 2's complement code for some operations, but has enough other advantages to make it worthwhile (so you sold me on that). It also has a template parameter that lets you tune the size of the internal cache (the bit that's used before any memory allocation takes place). Again this is a compromise: testing with my Miller Rabin code (another very important use case), increasing the internal cache size could actually make things worse if you happened to choose an "unfortunate" value compared to the size of the int's being tested. However, increasing some more so that no memory allocation ever happens then drops runtime down dramatically. Basically what I'm saying here, is there is no one true configuration, that offers the very best performance in all cases, if there was, we'd probably all be using GMP ;-) John. PS, the next step is to start stealing some more special case code from the SOC project.... I fully expect that to make your benchmarks run very slightly more slowly, but is necessary for other use cases....

John, Ah, you probably misunderstood what I meant. Yes, I misunderstood that you changed your fixed int implementation to a new one. There are other similar examples, but basically my case is that to make the
integer type general purpose there are actually lots of these special case checks, each of which are there for a very good reason, but which collectively easily account for that 10% slowdown in my code compared to yours.
Well, as I noticed before it's not just 10%, because algorithm execution time should be subtracted. Nevertheless I completely understand that it is not possible to get as fast code as mine. I would consider generalized implementation that is 50% slower to be pretty good (but not 3 times slower as before). It seems that you manage to do that so accept my congratulations! I am willing to have a look at your implementation, is it already in sandbox? Again, another compromise: sign magnitude is actually very slightly slower
than the old 2's complement code for some operations, but has enough other advantages to make it worthwhile (so you sold me on that)
Could you list those? I am willing to test :-) It also has a template parameter that lets you tune the size of the
internal cache (the bit that's used before any memory allocation takes place).
I am not sure what is internal cache here. Need to have a look at your implementation. Did you implement two separate backends: stack allocated, heap allocated? Or did you come up with generalized implementation that uses fixed array for small numbers and switches to dynamic array if exceeds its capacity? Basically what I'm saying here, is there is no one true configuration,
that offers the very best performance in all cases, if there was, we'd probably all be using GMP ;-)
Completely agree. But from advanced user perspective it's good to be able to specify configuration (stack or heap allocation, limb size, etc). For example it is possible to rewrite all the Voronoi code using GMP almost without any performance lose. But this will require to split huge math expressions onto atomic formulas, thus reducing code readability and maintainability. That's the main reason I decided to switch to fixed bigint. PS, the next step is to start stealing some more special case code from the
SOC project.... I fully expect that to make your benchmarks run very slightly more slowly, but is necessary for other use cases...
I'd like to have a look at your backend implementation. It seems to be already well done from your Voronoi benchmark. So it will be good to move parts of that SOC projects to your current implementation. One of the useful things we can think of is FFT multiplication (which is fast for large numbers). For medium size numbers I would think of Karatsuba multiplication. I would suggest generalized multiplication that will combine those two plus usual quadratic multiplication for small numbers. I can help on testing those and would ask Arseniy if he's interested on migrating his FFT code. Does it sound good for you? Andrii Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

There are other similar examples, but basically my case is that to make the
integer type general purpose there are actually lots of these special case checks, each of which are there for a very good reason, but which collectively easily account for that 10% slowdown in my code compared to yours.
Well, as I noticed before it's not just 10%, because algorithm execution time should be subtracted. Nevertheless I completely understand that it is not possible to get as fast code as mine. I would consider generalized implementation that is 50% slower to be pretty good (but not 3 times slower as before). It seems that you manage to do that so accept my congratulations! I am willing to have a look at your implementation, is it already in sandbox?
Yes, there are currently problems with the random number code not compiling with GCC, but apart from that it's all in pretty good shape (I hope!), docs here: http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht...
Again, another compromise: sign magnitude is actually very slightly slower
than the old 2's complement code for some operations, but has enough other advantages to make it worthwhile (so you sold me on that)
Could you list those? I am willing to test :-)
Well the main one is that it easier to code, and in particular to only track the limbs that are actually used, which proved too brain twisting for me with a 2's complement representation ;-)
It also has a template parameter that lets you tune the size of the
internal cache (the bit that's used before any memory allocation takes place).
I am not sure what is internal cache here. Need to have a look at your implementation. Did you implement two separate backends: stack allocated, heap allocated? Or did you come up with generalized implementation that uses fixed array for small numbers and switches to dynamic array if exceeds its capacity?
No, just one implementation, but it implements the equivalent of the small-string-optimization - which is to say the header for the dynamicly allocated buffer is in a union with a small internal buffer. The internal buffer is used first, then if the number grows too large a dynamic buffer is used. The template parameter controls the size of this internal cache - it defaults to the same size as the dynamic buffer header, but can be made larger if that helps. Finally, if the allocator is void, then no dynamic allocation ever takes place, and we have a fixed precision type, some of the core code gets simplified in this case as well. Otherwise, an unbounded integer with a large cache provides almost the best of both worlds - very nearly as fast as the fixed precision version, but no need to worry about overflow.
Basically what I'm saying here, is there is no one true configuration,
that offers the very best performance in all cases, if there was, we'd probably all be using GMP ;-)
Completely agree. But from advanced user perspective it's good to be able to specify configuration (stack or heap allocation, limb size, etc). For example it is possible to rewrite all the Voronoi code using GMP almost without any performance lose. But this will require to split huge math expressions onto atomic formulas, thus reducing code readability and maintainability. That's the main reason I decided to switch to fixed bigint.
PS, the next step is to start stealing some more special case code from the
SOC project.... I fully expect that to make your benchmarks run very slightly more slowly, but is necessary for other use cases...
I'd like to have a look at your backend implementation. It seems to be already well done from your Voronoi benchmark. So it will be good to move parts of that SOC projects to your current implementation. One of the useful things we can think of is FFT multiplication (which is fast for large numbers). For medium size numbers I would think of Karatsuba multiplication. I would suggest generalized multiplication that will combine those two plus usual quadratic multiplication for small numbers. I can help on testing those and would ask Arseniy if he's interested on migrating his FFT code. Does it sound good for you?
For sure, those are the two main routines that I want to pinch from his code - and if Arseniy can help with that, that would be great - I wasn't looking forward to having to get my head around those routines ;-) There are some other specialized routines (for squaring for example) that I also need to implement, as well as experiment with column-first multiplication to see whether it gains anything.... Cheers, John.

Yes, there are currently problems with the random number code not compiling with GCC, but apart from that it's all in pretty good shape (I hope!), docs here: http://svn.boost.org/svn/** boost/sandbox/big_number/libs/**multiprecision/doc/html/boost_** multiprecision/tut/ints.html<http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/html/boost_multiprecision/tut/ints.html>
Great, I'll have a look. No, just one implementation, but it implements the equivalent of the
small-string-optimization - which is to say the header for the dynamicly allocated buffer is in a union with a small internal buffer. The internal buffer is used first, then if the number grows too large a dynamic buffer is used. The template parameter controls the size of this internal cache - it defaults to the same size as the dynamic buffer header, but can be made larger if that helps. Finally, if the allocator is void, then no dynamic allocation ever takes place, and we have a fixed precision type, some of the core code gets simplified in this case as well. Otherwise, an unbounded integer with a large cache provides almost the best of both worlds - very nearly as fast as the fixed precision version, but no need to worry about overflow.
Sounds like a very nice solution. There are some other specialized routines (for squaring for example) that I
also need to implement, as well as experiment with column-first multiplication to see whether it gains anything....
Yes, most of those could be incrementally added. Cheers, Andrii Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

John Maddock wrote:
So basically the times are all about the same, except for very large fixed_int's, or gmp - and even those aren't *much* worse. There's also an experimental "cpp_int" in there which is an >all C++ arbitrary precision type that uses the "small value optimization" to avoid memory allocation when possible.
This is with the latest gtl sandbox code BTW. Maybe you've changed something to make the code much less dependent on the integer type? I assume that the only code I need to change to test a new type is the line that defines big_int_type in voronoi_user_traits?
John, I believe that Andrii may have disabled the lazy exact behavior of the voronoi of points algorithm to make it much more strongly dependent on big number runtime. Andrii can confirm if this is true. The lazy exact optimization is to make algorithm runtime virtually unaffected by big number runtime, so your benchmark results look reasonable given that the optimization is enabled by default in the sandbox. Regards, Luke

... So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
Thanks in advance, John.
------------------------------------------------------------------------------- Hi John, It sounds like you are moving right along. In the past, I've done a lot of performance tests with an old program for computing Lanczos coefficients. I just dusted it off and ported it. The preliminary first look is OK. Any type can plug into it with a typedef, pi, sqrt, pown, sin, exp, log. I can provide you with the source after further verification if you like. The test is timed and it takes about 1.5 seconds for 300 decimal digits on my machine. It's memory intensive since the matrices are dynamic in RAM and not put to disk. So it won't go up over about 2000 decimal digits. You can play around with "g" and "n" like we once talked about. Sincerely, Chris.

... So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
Thanks in advance, John.
Ooooops. Disregard my last post. You want integer tests and I suggested a float. Best regards, Chris.

Hi John, I was almost done implementing benchmark with fixed_int and voronoi library when next problem appeared. Let's consider following code snippet: #include <iostream> #include "boost/multiprecision/fixed_int.hpp" using namespace boost::multiprecision; template <typename T> void foo(const T& that) { std::cout << "Using default foo." << std::endl; } template <> void foo<mp_int512_t>(const mp_int512_t& that) { std::cout << "Using fixed_int specialization." << std::endl; } int main() { mp_int512_t a = 1; mp_int512_t b = 2; foo(a); // outputs 'Using fixed_int specialization.' foo(a * b); // outputs 'Using default foo.' return 0; } I understand the reason of such behaviour (results of the expressions are stored as some intermediate objects), but in my implementation I need to specify template function specialization that converts fixed_int to double. Any suggestions on how to fix this? Regards, Andrii On Tue, Jan 24, 2012 at 10:31 PM, Christopher Kormanyos <e_float@yahoo.com>wrote:
... So I'd be really interested to put the type through it's paces with some real world code that really thrashes an extended precision integer type. Any suggestions? Anything in Boost?
Thanks in advance, John.
Ooooops. Disregard my last post. You want integer tests and I suggested a float.
Best regards, Chris.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I was almost done implementing benchmark with fixed_int and voronoi library when next problem appeared. Let's consider following code snippet:
#include <iostream> #include "boost/multiprecision/fixed_int.hpp" using namespace boost::multiprecision;
template <typename T> void foo(const T& that) { std::cout << "Using default foo." << std::endl; }
template <> void foo<mp_int512_t>(const mp_int512_t& that) { std::cout << "Using fixed_int specialization." << std::endl; }
int main() { mp_int512_t a = 1; mp_int512_t b = 2; foo(a); // outputs 'Using fixed_int specialization.' foo(a * b); // outputs 'Using default foo.' return 0; }
I understand the reason of such behaviour (results of the expressions are stored as some intermediate objects), but in my implementation I need to specify template function specialization that converts fixed_int to double. Any suggestions on how to fix this?
Nod. It is a problem, and it's highlighted in the docs. BTW does your code work with mpz_class? If so it should be expression template safe already.... but leaving that aside, your options are: 1) Overload for the expression template type - but that's an implementation detail, and may result in a lot of unnecessary overloads. 2) Whenever you call a function template with an expression as argument, cast the result of the expression to the number type, so for example: foo(static_cast<number_type>(a+b)); The static_cast should be a no-op if the result of a+b is already that type, otherwise the expression template gets converted to it's actual number. 3) As above but provide the template argument explicitly: foo<number_type>(a+b); 4) For public API's it may be convenient to provide some traits classes and write: template <class A1, class A2, class A3> typename result_type<A1, A2, A3>::type foo(const A1& a1, const A2& a2, const A3& a3) { return foo_imp<typename calculation_type<A1, A2, A3>::type>(a1, a2, a3); } Then if the user calls foo(unsigned, long long, short); The arguments will get correctly promoted internally and return an appropriately sized result. The traits result_type and calculation_type might be the same type of course, and could probably just be wrappers around boost::common_type from type_traits (or maybe even *be* that type??). For what it's worth, Boost.Math does (2) for internal calls, and (4) for public API's, of course with floats you don't have to worry about the whole signed/unsigned business which may scupper (4) somewhat. Apologies for the long answer! HTH, John.

BTW does your code work with mpz_class? If so it should be expression template safe already....
Yes, it was working before I did major refactoring of the code. At that point I wasn't using template parametrized structures to do conversion from integer types to floating point types. 2) foo(static_cast<number_type>(**a+b)); 3) foo<number_type>(a+b); I was thinking of those approaches. It makes sense to change my code appropriately as it will also enable usage of mpz_class as you noticed above. As a separate benchmark it would be good to have time comparison of fixed_int code working with and without expression template type. Will post benchmark results a bit later. Thanks, Andrii

I was thinking of those approaches. It makes sense to change my code appropriately as it will also enable usage of mpz_class as you noticed above. As a separate benchmark it would be good to have time comparison of fixed_int code working with and without expression template type.
I'm a little confused why we need expression templates for fixed int. Does the fixed int dynamically allocate? If not, what do the expression templates accomplish? For stack allocation of fixed int values I wouldn't expect reuse of memory to be much of a performance win. Can you clarify for me, John? Thanks, Luke

John, It took me some time to figure out that convert_to<double>() method doesn't work properly for negative integers (Please correct me if I'm wrong). Here is the code snippet: #include "boost/multiprecision/fixed_int.hpp" using namespace boost::multiprecision; int main() { mp_int512_t a = 1; double da = a.convert_to<double>(); mp_int512_t b = -1; double db = b.convert_to<double>(); std::cout << da << std::endl; // Outputs: 1 std::cout << db << std::endl; // Outputs: 1.34078e+154 return 0; } As soon as this fix is ready, I'll be able to finish benchmark. Regards, Andrii

It took me some time to figure out that convert_to<double>() method doesn't work properly for negative integers (Please correct me if I'm wrong). Here is the code snippet:
#include "boost/multiprecision/fixed_int.hpp" using namespace boost::multiprecision;
int main() { mp_int512_t a = 1; double da = a.convert_to<double>(); mp_int512_t b = -1; double db = b.convert_to<double>(); std::cout << da << std::endl; // Outputs: 1 std::cout << db << std::endl; // Outputs: 1.34078e+154 return 0; }
As soon as this fix is ready, I'll be able to finish benchmark.
Ugh. Looks like that slipped through the test cases. That and a couple other bugs fixed in SVN. Cheers, John.

I was thinking of those approaches. It makes sense to change my code appropriately as it will also enable usage of mpz_class as you noticed above. As a separate benchmark it would be good to have time comparison of fixed_int code working with and without expression template type.
I'm a little confused why we need expression templates for fixed int. Does the fixed int dynamically allocate? If not, what do the expression templates accomplish? For stack allocation of fixed int values I wouldn't expect reuse of memory to be much of a performance win. Can you clarify for me, John?
Good question. The short answer is the expression templates are there because they're part of the overall frontend framework (aside: it might be worthwhile providing a simpler non-expression-template frontend for debugging purposes, but that's another issue). Whether they're of any benefit in this case (and no it doesn't dynamically allocate) is open to question. Except that: * Reducing temporaries should reduce the amount of array copying going on, and * The expression template do *sometimes* reduce the number of arithmetic ops taking place - by applying the rules of arithmetic they can sometime simplify the expressions, a banal example would be something like: a = a+b+c; which gets transformed to: a+= b; a+=c; Probably doesn't make much difference, and I'm sure there are counter examples where it hurts, but it'll be interesting to see how things shake out in practice... John.

* The expression template do *sometimes* reduce the number of arithmetic ops taking place - by applying the rules of arithmetic they can sometime simplify the expressions.
That's a little alarming actually. If the integer is fixed length that means it may overflow and of course will also truncate in division since it is integer. Just like the compiler, we can't apply an optimization that changes the result the code may produce when overflow and truncation are considered as part of the normal behavior of the type. Sometimes arithmetic is written specifically with overflow and truncation in mind. Since any arithmetic operation may overflow this can really tie your hands. For numerical uses of fixed int we may assume that the programmer designed the code in such a way that overflow should never happen, but for behavioral simulation of hardware with fixed width integer units the correct overflow behavior is the whole point. By the way, I'm really excited about the expression template frontend. I like the elimination of unneeded copies and cutting down on the number of temporaries. I also think it is really cool that you are able to do the arithmetic transformations on the expression template trees. Are you using proto? Regards, Luke

That's a little alarming actually. If the integer is fixed length that means it may overflow and of course will also truncate in division since it is integer. Just like the compiler, we can't apply an optimization that changes the result the code may produce when overflow and truncation are considered as part of the normal behavior of the type. Sometimes arithmetic is written specifically with overflow and truncation in mind. Since any arithmetic operation may overflow this can really tie your hands. For numerical uses of fixed int we may assume that the programmer designed the code in such a way that overflow should never happen, but for behavioral simulation of hardware with fixed width integer units the correct overflow behavior is the whole point.
Very good point! Looks like I'll have to write that other frontend then....
By the way, I'm really excited about the expression template frontend. I like the elimination of unneeded copies and cutting down on the number of temporaries. I also think it is really cool that you are able to do the arithmetic transformations on the expression template trees. Are you using proto?
No. I started off using Proto, but compile times were so long it pretty much melted down my laptop :-( After some discussion on the list here I switched to a home-rolled expression-template implementation that was designed to minimize the number of templates instantiated (not to mention the amount of meta-programming involved). Even so, compile times can still be quite long if you instantiate lots of different expressions, but it is at least manageable/usable now. John.
participants (5)
-
Andrii Sydorchuk
-
Christopher Kormanyos
-
John Maddock
-
Simonson, Lucanus J
-
thijs@sitmo.com