Large Integer Library

newer
[type_index] C++03 implementation...

Michael Tryhorn

30 Jun 2012 30 Jun '12

11:06 a.m.

Dear Boosters, I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process. The library and all its files are publicly available on github, at: http://github.com/mjtryhorn/Boost.LargeInt A tagged .zip of these files (v1.0.0) is available for download at: http://github.com/mjtryhorn/Boost.LargeInt/tags This library has been written for strict conformance to C++03 and the Boost coding guidelines. This library is also header-only, and as such may be directly merged into existing Boost installations without the need for recompilation. All files are distributed under the Boost Software License, Version 1.0. Now a quick explanation of why I've written this library and what it does: I have found myself on several occasions to be working with unsigned 64bit numbers, mostly for the sizes of files or C++ collections. Where I had a need to perform a calculation upon those numbers, there was often the possibility of overflow. 128bit integers would have been an appropriate fix, but these are either entirely unavailable, non-portable, or in a library which I preferred not to use either due to its complexity or incompatible licensing. Floating-point arithmetic is also a means around the problem, but is inherently inaccurate. I decided to fix this problem once-and-for-all by writing a new integer arithmetic library, making judicious use of C++'s template-metaprogramming and Boost to make my solution both as simple and as transparent as possible. Now, what started out as 128bit integers for my own use has grown into a fully- functional library which has the ability to compose any two integer or integer- like types into a new, larger, signed or unsigned integer type, including all the standard integral operators and a number of utility functions to support them. Thus, Boost.LargeInt supports not only signed and unsigned 128bit integer arithmetic but 24bit, 64bit, 160bit, 192bit, 256bit, 512bit and more. Boost.LargeInt is compatible and has been tested with the following compilers and platforms: GNU GCC 4.6.1, 4.6.2 (Linux, Ubuntu 11.10 (64bit) & openSUSE 12.1 (32bit)) Borland C++ 5.6.4 (Windows XP SP3, Borland C++ Builder 6) Microsoft Visual C++ 2010 (Windows XP SP3, Microsoft Visual Studio 2010) (I have also tried to add compatibility for Microsoft Visual C++ 6, but unfortunately this compiler's support for template-metaprogramming is insufficient and the tests for Boost.LargeInt fail to compile.) Documentation is included in BoostBook format (.qbk), but a pre-compiled version is available alongside the library code itself for your convenience. Please, if you have the time, download this library and give it a try. Any comments, or information upon its compatibility (or lack thereof) with compilers which are or are not listed would be greatly appreciated. If no-one has any comments (I think that unlikely ;-) ), then I shall resubmit this library for formal review. Thanks, Michael J. Tryhorn

Show replies by date

John Maddock

30 Jun 30 Jun

11:52 a.m.

...

I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

Michael, How does this differ from Boost.Multiprecision currently being reviewed? Thanks, John.

Michael Tryhorn

12:23 p.m.

John, To be honest, I haven't had a chance to look at the Multiprecision library in depth. However, I have had a quick read through the documentation and have found a few points of difference, although there may be others: (All taken from http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht... ) * "Fixed at compile time values, for example 50 decimal digits" Boost.LargeInt is designed to stick-together two existing integer or integer- like types, so although also being fixed at compile-time, it is fixed to a number of binary-bits rather than decimal digits. * "A header-only Boost license version is always available (if somewhat slower)." Boost.LargeInt is solely header and template based, and so is fully compatible with, Boost License v1.0 without restriction. So, it performs as optimally as possible regardless of licensing requirements. * "Mixing arithmetic operations using types of different precision is strictly forbidden" Boost.LargeInt has been specifically designed to avoid such restrictions - where two integers of differing precision are compared the smaller of the two is automatically promoted to be the same type as the larger. Them emphasis has always been upon being "as simple as possible". For example, for 128bit integer support, all one would need is the pre-existing boost::large_int::lint128_t type to take the initial value and any calculation, and a boost::large_int::large_int_cast<C>(value) to convert from 128bit to a built-in integer type, if needed. * "You have to be careful to cast the result of an expression to the actual number type when passing an expression to a template function" The results of all arithmetic operators in Boost.LargeInt are themselves boost::large_int::large_int based types, and so their use is fully transparent. Please note that as all boost::large_int::large_int based types are themselves integer-like, Boost.Multiprecision may even be interoperable with them. If you are free to give it a try, I would welcome any input. Thanks, Michael J. Tryhorn On 30 June 2012 12:52, John Maddock <boost.regex@virgin.net> wrote:

...

I have written a Boost-compatible library for the representation and

...
manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

Michael,

How does this differ from Boost.Multiprecision currently being reviewed?

Thanks, John.

______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

Michael Tryhorn

2:44 p.m.

John/everyone, I have a few other features and benefits of the Boost.LargeInt library, which may preempt some questions. I'm not sure how Boost.Multiprecision compares on these specifically, but I thought they would help: * All logical, arithmetic and bitwise operators in Boost.LargeInt have constant, O(1) complexity; except for multiplication ('*'), division ('/') and modulus ('%') which are O(n) on the number of bits in the respective large_int type. Hence support for non power-of-2 large_ints of 96bits, 160bits or otherwise may be used to improve calculation performance. * All boost::large_int::large_int based types are comparable against long-type literal values. (These get an automatic promotion to a boost::large_int equivalent to service said comparison.) * I've written it to work across multiple compilers, to hopefully make this available to all. My reasoning being that if it can work in GCC, MS Visual C++ and a rather old and somewhat temperamental copy of Borland C++ Builder then it should work just about anywhere. Thanks, Michael J. Tryhorn On 30 June 2012 13:23, Michael Tryhorn <mjtryhorn@gmail.com> wrote:

...

John,

To be honest, I haven't had a chance to look at the Multiprecision library in depth. However, I have had a quick read through the documentation and have found a few points of difference, although there may be others:

(All taken from http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht... )

* "Fixed at compile time values, for example 50 decimal digits"

Boost.LargeInt is designed to stick-together two existing integer or integer- like types, so although also being fixed at compile-time, it is fixed to a number of binary-bits rather than decimal digits.

* "A header-only Boost license version is always available (if somewhat slower)."

Boost.LargeInt is solely header and template based, and so is fully compatible with, Boost License v1.0 without restriction. So, it performs as optimally as possible regardless of licensing requirements.

* "Mixing arithmetic operations using types of different precision is strictly forbidden"

Boost.LargeInt has been specifically designed to avoid such restrictions - where two integers of differing precision are compared the smaller of the two is automatically promoted to be the same type as the larger. Them emphasis has always been upon being "as simple as possible". For example, for 128bit integer support, all one would need is the pre-existing boost::large_int::lint128_t type to take the initial value and any calculation, and a boost::large_int::large_int_cast<C>(value) to convert from 128bit to a built-in integer type, if needed.

* "You have to be careful to cast the result of an expression to the actual number type when passing an expression to a template function"

The results of all arithmetic operators in Boost.LargeInt are themselves boost::large_int::large_int based types, and so their use is fully transparent.

Please note that as all boost::large_int::large_int based types are themselves integer-like, Boost.Multiprecision may even be interoperable with them. If you are free to give it a try, I would welcome any input.

Thanks,

Michael J. Tryhorn

On 30 June 2012 12:52, John Maddock <boost.regex@virgin.net> wrote:

...
I have written a Boost-compatible library for the representation and

...
manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

Michael,

How does this differ from Boost.Multiprecision currently being reviewed?

Thanks, John.

______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

Steven Watanabe

3:02 p.m.

AMDG On 06/30/2012 07:44 AM, Michael Tryhorn wrote:

...

John/everyone,

I have a few other features and benefits of the Boost.LargeInt library, which may preempt some questions. I'm not sure how Boost.Multiprecision compares on these specifically, but I thought they would help:

* All logical, arithmetic and bitwise operators in Boost.LargeInt have constant, O(1) complexity; except for multiplication ('*'), division ('/') and modulus ('%') which are O(n) on the number of bits in the respective large_int type. Hence support for non power-of-2 large_ints of 96bits, 160bits or otherwise may be used to improve calculation performance.

That can't be right. Addition of two n-bit numbers requires processing all n bits and is therefore O(n). The fastest known algorithm for multiplying two n-bit numbers is slower than O(n). In Christ, Steven Watanabe

John Maddock

5:16 p.m.

...

(All taken from http://svn.boost.org/svn/boost/sandbox/big_number/libs/multiprecision/doc/ht... )

* "Fixed at compile time values, for example 50 decimal digits"

Boost.LargeInt is designed to stick-together two existing integer or integer- like types, so although also being fixed at compile-time, it is fixed to a number of binary-bits rather than decimal digits.

Please note the "for example" part. Fixed precision integers are fixed to the number of bits in Boost.Multiprecision the same as yours are.

...

* "A header-only Boost license version is always available (if somewhat slower)."

Boost.LargeInt is solely header and template based, and so is fully compatible with, Boost License v1.0 without restriction. So, it performs as optimally as possible regardless of licensing requirements.

So no difference then? ;-)

...

* "You have to be careful to cast the result of an expression to the actual number type when passing an expression to a template function"

The results of all arithmetic operators in Boost.LargeInt are themselves boost::large_int::large_int based types, and so their use is fully transparent.

You can turn expression templates off in the multiprecision lib, in which case it behaves as yours does - in fact the expression templates are turned off for the fixed precision integer part, as it adds nothing there.

...

* All logical, arithmetic and bitwise operators in Boost.LargeInt have constant, O(1) complexity; except for multiplication ('*'), division ('/') and modulus ('%') which are O(n) on the number of bits in the respective large_int type. Hence support for non power-of-2 large_ints of 96bits, 160bits or otherwise may be used to improve calculation performance.

Congratulations, the Fields medal is clearly on it's way! Seriously, you might want to think again on those complexities...

...

* All boost::large_int::large_int based types are comparable against long-type literal values. (These get an automatic promotion to a boost::large_int equivalent to service said comparison.)

Likewise the multiprecision lib (in fact promotion to the extended type first isn't strictly necessary - you can do things more efficiently than that). Cheers, John.

Rafaël Fourquet

3:10 p.m.

...

Please, if you have the time, download this library and give it a try. Any comments, or information upon its compatibility (or lack thereof) with compilers which are or are not listed would be greatly appreciated.

I started playing with your library using gcc 4.7.1 and it seems compatible. I will definitely want to use it instead of my own unfinished and probably buggy implementation. A few questions: 1) Could the programmer be given access (through member functions) to the two members of a large_int? For example, I need a function to swap the high and low bits of an integer-like value x. When x is of type large_int<T, T>, I guess it would be more efficient to manipulate directly the two members x.m_lo and x.m_hi instead of using the usual bit-operations. 2) The typedef for l[u]int128_t does not seem to use the builtin corresponding type when available, could this be done? 3) I had "undefined symbol" problems when using the builtin 128 bits integers (provided by gcc) in the signature of functions in a shared library (it was a module created with Boost.Python, the problem occurred when importing this module within python). The only solution I found was to wrap integer-like types inside a (templated) class, which looks like a particular case of your library (this would be "large_int<EMPTY_TYPE, U>"). Would LargeInt be an appropriate place to solve this problem? 4) Why use Boost.Operator for some operators but not all (e.g. comparison operators)? Thanks, Rafael

Vicente J. Botet Escriba

4:31 p.m.

Le 30/06/12 13:06, Michael Tryhorn a écrit :

...

Dear Boosters,

I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and

Please, if you have the time, download this library and give it a try. Any comments, or information upon its compatibility (or lack thereof) with compilers which are or are not listed would be greatly appreciated.

If no-one has any comments (I think that unlikely ;-) ), then I shall resubmit this library for formal review.

Hi, I have not found through the doc what is the result of luint128_t +luint128_t luint128_t +luint256_t Could you point me where I can find these kind of information? Best, Vicente

John Maddock

6:06 p.m.

Michael, While I think there is a place for a fixed size integer library in Boost that's "as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison. So for comparison, I measured how long it took to compute 100 dot products of 1000-element vectors. The vectors were filled with values that "used" a specified number of bits (to simulate the typical use case were most values are quite small, and just a few actually require an extended precision type), here's the results compiled on Win32 with VC10 and /Ox: Multiprecision mp_int128: 32 bit time = 0.0100556 seconds 64 bit time = 0.02594 seconds 96 bit time = 0.0229292 seconds 128 bit time = 0.0142151 seconds Multiprecision mp_int256: 32 bit time = 0.00376926 seconds 64 bit time = 0.0102474 seconds 96 bit time = 0.01383 seconds 128 bit time = 0.0192363 seconds large_int<unsigned long long, long long>: 32 bit time = 0.0498975 seconds 64 bit time = 0.111413 seconds 96 bit time = 0.183727 seconds 128 bit time = 0.251469 seconds large_int<large_int<unsigned long long, unsigned long long>, large_int<unsigned long long, long long> >: 32 bit time = 1.08451 seconds 64 bit time = 2.43866 seconds 96 bit time = 3.79658 seconds 128 bit time = 5.20211 seconds Note how the times taken for mp_intXXX grow largely based on bits actually used, not bits specified in the type. But also, even the case where I expected large_int to do well - in fact where it really should win in this kind of comparison - for part filled 128-bit int's, it's still over 4 times slower than the multiprecision code :-( My guess is this is a consequence of the implementation method (as a doubled-type rather than an array of limbs)? If so that's actually quite useful information to know. Regards, John PS I see you're using bit by bit multiplication - easy to understand and code, but it's glacially slow in general, I tried this for integer division early on and it ran like treacle (on a very cold day)! PPS code follows: #include <boost/chrono.hpp> #include <boost/multiprecision/cpp_int.hpp> #include <boost/large_int.hpp> template <class Clock> struct stopwatch { typedef typename Clock::duration duration; stopwatch() { m_start = Clock::now(); } duration elapsed() { return Clock::now() - m_start; } void reset() { m_start = Clock::now(); } private: typename Clock::time_point m_start; }; template <class T> void time() { std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0); boost::chrono::duration<double> time; stopwatch<boost::chrono::high_resolution_clock> c; for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "32 bit time = " << time << std::endl; for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "64 bit time = " << time << std::endl; for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "96 bit time = " << time << std::endl; for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "128 bit time = " << time << std::endl; } int _tmain(int argc, _TCHAR* argv[]) { using namespace boost::multiprecision; time<mp_int128_t>(); time<mp_int256_t>(); typedef boost::large_int::large_int<unsigned long long, long long> t128; typedef boost::large_int::large_int<unsigned long long, unsigned long long> ut128; typedef boost::large_int::large_int<ut128, t128> t256; time<t128>(); time<t256>(); return 0; }

Michael Tryhorn

7:47 p.m.

Dear all, Some responses to questions asked so far:

...

That can't be right. Addition of two n-bit numbers requires processing all n bits and is therefore O(n). The fastest known algorithm for multiplying two n-bit numbers is slower than O(n).

I may be wrong, but my understanding was that at least for an addition operation, while certainly implemented via adders in hardware (and hence O(n)), we would not call it so in algorithms that use it. Let me give an example. If I wrote a function thus: int add_these(int a, int b) { return a+b; } I would expect that rather than converting my '+' to a loop, the compiler would turn it into assembly 'LOAD/ADD/STORE'. My evaluation of the complexity of the above algorithm itself (if we can call it such) would be to label it 'O(1)' rather than 'O(n)'. For reference, let me quote the code from large_int::operator+= : large_int<T, U>& large_int<T, U>::operator+= (const this_type& val) { // Take a copy of *this, to modify this_type ret(*this); // Perform the addition ret.m_lo += val.m_lo; ret.m_hi += val.m_hi; if( ret.m_lo < m_lo ) { ++ret.m_hi; } // Commit the result return( *this = ret ); } Although a little more complex, additions should only ever perform a single pass of this function. As for multiplication, the operator*= implementation uses a for loop which loops at most 'n' times, where 'n' is the number of binary bits in the integer, making use of += (above) and the shift operators.

...

Could the programmer be given access (through member functions) to the two members of a large_int?

Absolutely. In fact, I used to allow this and removed the functionality to make large_int more 'integer like'. I wouldn't add this as member functions (as this would be inconsistent with the built-in integer types). I would, however, either place this functionality into large_int_traits as accessors, in the boost::large_int namespace as free-functions, or both.

...

The typedef for l[u]int128_t does not seem to use the builtin corresponding type when available, could this be done?

Again, yes, absolutely. Provided these types support all the required operations as well as std::numeric_limits, this could be done via preprocessor macros to detect the necessary compiler support (as it already done where 64bit is unavailable).

...

I had "undefined symbol" problems when using the builtin 128 bits integers (provided by gcc) in the signature of functions in a shared library (it was a module created with Boost.Python, the problem occurred when importing this module within python). The only solution I found was to wrap integer-like types inside a (templated) class, which looks like a particular case of your library (this would be "large_int<EMPTY_TYPE, U>"). Would LargeInt be an appropriate place to solve this problem?

I can't speak from experience on this in particular, however where I had a need to make an existing integral type "slightly larger", I did it with this little trick: typedef boost::large_int::large_int<unsigned int, T> my_int_type; Where 'T' is the type you'd be wrapping. This extends 'T' by combining it with an unsigned low_part_type.

...

Why use Boost.Operator for some operators but not all (e.g. comparison operators)?

It was to do with integral promotion, and I have to admit here that I made a mistake in my earlier email where I stated that all arithmetic operators were fine with mixing their lhs and rhs operator sizes. This is not the case for arithmetic operators, but it is for comparison ('<', '<=', '==', and so on). Where two large_int values which differ in size are compared, the smaller of the two must be promoted to the same type as the larger so the comparison may take place without losing integer information. If I were to use Boost.Operator to define these operators automatically, I don't think such promotion would be possible.

...

I have not found through the doc what is the result of luint128_t +luint128_t luint128_t +luint256_t Could you point me where I can find these kind of information?

If you are referring to the arithmetic operator '+', luint128_t +luint128_t would produce a new luint128_t value. Just like their 64bit equivalent, the result can overflow. As for luint128_t + luint256_t, this is where I follow on from the above answer and it gets a little tricky. As the 128bit integer is the left-hand-side, I believe that boost::addable<T> will return the left-hand-side type of the expression. It might be possible to overcome this by *not* inherting from boost::integer_arithmetic and defining all operators manually with some MPL-like type selection. This might be good for a version 1.1! For the moment, if you *know* that a 256bit result is needed, using operator+= with a 256bit large_int left-hand-side will most certainly produce a 256bit result. Thanks for the input - I'll add notes on this to the documentation. (Regarding "Fixed at compile time values")

...

Please note the "for example" part. Fixed precision integers are fixed to the number of bits in Boost.Multiprecision the same as yours are.

If you mean that the optional selection of an integer type by the number of decimal digits required rather than the underlying type, then fair enough. Is that what you mean? Boost.LargeInt types are always created by a number of binary digits at the moment. Not being too familiar with Boost.Multiprecision, could I please ask if it is possible to select an integral type from a number of binary bits, and is there any restriction on power-of-2 vs. non-power-of-2?

...

"A header-only Boost license version is always available (if somewhat slower)." So no difference then? ;-)

I may have misread your documentation, but I thought you said that some functionality was dependent upon third party restricted license libraries, but that 'a header-only Boost license version is always available (if somewhat slower)'. I took this to mean that Boost.Multiprecision's optimal performance was only available where the restricted licencing for those additonal libraries was acceptable to the end user. Please correct me if I misunderstood this point. Do you mean slower to compile? Or, slower to run? Boost.LargeInt is actually pretty light-weight on headers; it short in code, doesn't include many headers itself and even leaves out I/O the headers where not needed.

...

You can turn expression templates off in the multiprecision lib, in which case it behaves as yours does - in fact the expression templates are turned off for the fixed precision integer part, as it adds nothing there.

Fair enough.

...

Congratulations, the Fields medal is clearly on it's way! Seriously, you might want to think again on those complexities...

No need to be rude. My reasoning for the O(x) complexities is explained above. (Regarding comparison against literal values)

...

Likewise the multiprecision lib (in fact promotion to the extended type first isn't strictly necessary - you can do things more efficiently than that).

...

While I think there is a place for a fixed size integer library in Boost

How could this be done more efficiently? Comparison against a literal in Boost.LargeInt uses the large_int constructor, which is again O(1) (subject to my understanding of complexity calculations :-P). To convert a 'long' to a 'large_int' for these comparisons, it uses the above 'make it larger' type trick to create a temporary - which is itself composed of just two integers, allocated from the stack or registers as the compiler sees fit. that's

...

"as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison.

I agree that some benchmarking of Boost.LargeInt against the header-only equivalent of Boost.Multiprecision is certainly required, and I will take a look at the results you have supplied. I am glad you think, as I do, that there is a place for a fixed size integer library in Boost, and I am keen that it is as useful to developers as possible. Thanks everyone for all the feedback so far, Michael J. Tryhorn On 30 June 2012 19:06, John Maddock <boost.regex@virgin.net> wrote:

...

Michael,

While I think there is a place for a fixed size integer library in Boost that's "as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison.

So for comparison, I measured how long it took to compute 100 dot products of 1000-element vectors. The vectors were filled with values that "used" a specified number of bits (to simulate the typical use case were most values are quite small, and just a few actually require an extended precision type), here's the results compiled on Win32 with VC10 and /Ox:

Multiprecision mp_int128: 32 bit time = 0.0100556 seconds 64 bit time = 0.02594 seconds 96 bit time = 0.0229292 seconds 128 bit time = 0.0142151 seconds Multiprecision mp_int256: 32 bit time = 0.00376926 seconds 64 bit time = 0.0102474 seconds 96 bit time = 0.01383 seconds 128 bit time = 0.0192363 seconds

large_int<unsigned long long, long long>: 32 bit time = 0.0498975 seconds 64 bit time = 0.111413 seconds 96 bit time = 0.183727 seconds 128 bit time = 0.251469 seconds large_int<large_int<unsigned long long, unsigned long long>, large_int<unsigned long long, long long> >: 32 bit time = 1.08451 seconds 64 bit time = 2.43866 seconds 96 bit time = 3.79658 seconds 128 bit time = 5.20211 seconds

Note how the times taken for mp_intXXX grow largely based on bits actually used, not bits specified in the type. But also, even the case where I expected large_int to do well - in fact where it really should win in this kind of comparison - for part filled 128-bit int's, it's still over 4 times slower than the multiprecision code :-(

My guess is this is a consequence of the implementation method (as a doubled-type rather than an array of limbs)? If so that's actually quite useful information to know.

Regards, John

PS I see you're using bit by bit multiplication - easy to understand and code, but it's glacially slow in general, I tried this for integer division early on and it ran like treacle (on a very cold day)!

PPS code follows:

#include <boost/chrono.hpp> #include <boost/multiprecision/cpp_int.**hpp> #include <boost/large_int.hpp>

template <class Clock> struct stopwatch { typedef typename Clock::duration duration; stopwatch() { m_start = Clock::now(); } duration elapsed() { return Clock::now() - m_start; } void reset() { m_start = Clock::now(); }

private: typename Clock::time_point m_start; };

template <class T> void time() { std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0); boost::chrono::duration<**double> time; stopwatch<boost::chrono::high_**resolution_clock> c; for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "32 bit time = " << time << std::endl;

for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "64 bit time = " << time << std::endl;

for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "96 bit time = " << time << std::endl;

for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "128 bit time = " << time << std::endl;

}

int _tmain(int argc, _TCHAR* argv[]) { using namespace boost::multiprecision;

time<mp_int128_t>(); time<mp_int256_t>();

typedef boost::large_int::large_int<**unsigned long long, long long> t128; typedef boost::large_int::large_int<**unsigned long long, unsigned long long> ut128; typedef boost::large_int::large_int<**ut128, t128> t256;

time<t128>(); time<t256>();

return 0;

}

______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

Steven Watanabe

8:04 p.m.

AMDG On 06/30/2012 12:47 PM, Michael Tryhorn wrote:

...

Dear all,

Some responses to questions asked so far:

...
That can't be right. Addition of two n-bit numbers requires processing all n bits and is therefore O(n). The fastest known algorithm for multiplying two n-bit numbers is slower than O(n).

I may be wrong, but my understanding was that at least for an addition operation, while certainly implemented via adders in hardware (and hence O(n)), we would not call it so in algorithms that use it. Let me give an example. If I wrote a function thus:

int add_these(int a, int b) { return a+b; }

I would expect that rather than converting my '+' to a loop, the compiler would turn it into assembly 'LOAD/ADD/STORE'. My evaluation of the complexity of the above algorithm itself (if we can call it such) would be to label it 'O(1)' rather than 'O(n)'. For reference, let me quote the code from large_int::operator+= :

large_int<T, U>& large_int<T, U>::operator+= (const this_type& val) { // Take a copy of *this, to modify this_type ret(*this);

// Perform the addition ret.m_lo += val.m_lo; ret.m_hi += val.m_hi; if( ret.m_lo < m_lo ) { ++ret.m_hi; }

// Commit the result return( *this = ret ); }

Although a little more complex, additions should only ever perform a single pass of this function. As for multiplication, the operator*= implementation uses a for loop which loops at most 'n' times, where 'n' is the number of binary bits in the integer, making use of += (above) and the shift operators.

You're being inconsistent about whether the number of bits is a constant. The number of operations is O(d) where d is the number of built-in integers in the argument after expanding large_int recursively. Since the number of bits in a built-in integer type is a constant, O(d) = O(n). In Christ, Steven Watanabe

Marc Glisse

8:10 p.m.

On Sat, 30 Jun 2012, Michael Tryhorn wrote:

...

...
That can't be right. Addition of two n-bit numbers requires processing all n bits and is therefore O(n). The fastest known algorithm for multiplying two n-bit numbers is slower than O(n).

I may be wrong, but my understanding was that at least for an addition operation, while certainly implemented via adders in hardware (and hence O(n)), we would not call it so in algorithms that use it. Let me give an example. If I wrote a function thus:

int add_these(int a, int b) { return a+b; }

I would expect that rather than converting my '+' to a loop, the compiler would turn it into assembly 'LOAD/ADD/STORE'. My evaluation of the complexity of the above algorithm itself (if we can call it such) would be to label it 'O(1)' rather than 'O(n)'.

Yes.

...

For reference, let me quote the code from large_int::operator+= :

large_int<T, U>& large_int<T, U>::operator+= (const this_type& val) { // Take a copy of *this, to modify this_type ret(*this);

// Perform the addition ret.m_lo += val.m_lo; ret.m_hi += val.m_hi; if( ret.m_lo < m_lo ) { ++ret.m_hi; }

// Commit the result return( *this = ret ); }

So when you add numbers of 2n bits, you perform 2 additions of numbers of n bits, plus a comparison and an increment. A(2n) <= 2*A(n) + n so A(n) = O(n*log(n)) (I haven't checked whether the bound is tight or the true complexity is log(n)) For a fixed n, O(1)=O(n), complexities are about how things grow with n.

...

Although a little more complex, additions should only ever perform a single pass of this function. As for multiplication, the operator*= implementation uses a for loop which loops at most 'n' times, where 'n' is the number of binary bits in the integer, making use of += (above) and the shift operators.

About the multiplication, maybe you could try using the n x n multiplication in the implementation of the 2n x 2n multiplication?

...

...
The typedef for l[u]int128_t

By the way, why the 'l' prefix?

...

...
I had "undefined symbol" problems when using the builtin 128 bits integers (provided by gcc) in the signature of functions in a shared library (it was a module created with Boost.Python, the problem occurred when importing this module within python). The only solution I found was to wrap integer-like types inside a (templated) class, which looks like a particular case of your library (this would be "large_int<EMPTY_TYPE, U>").

Sounds like you think there is a bug in the compiler. Did you file a report? -- Marc Glisse

Christopher Kormanyos

10:17 p.m.

...

While I think there is a place for a fixed size integer library in Boost that's "as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison.

I agree. We are in the process of reviewing the proposed Boost.Multiprecision right now. The review has been overwhelmingly non-controversial. One of the main points that did arise, however, was the performance and seamless interoperability of near-metal signed and unsigned integers with, say, 128, 256, 512 bits. There is widespread community interest in integer types of these sizes. That being said, though, performance is crucial for the users in this digit range. See below.

...

So for comparison, I measured how long it took to compute 100 dot products of 1000-element vectors. The vectors were filled with values that "used" a specified number of bits (to simulate the typical use case were most values are quite small, and just a few actually require an extended precision type), here's the results compiled on Win32 with VC10 and /Ox:

...

Multiprecision mp_int128: 32 bit time = 0.0100556 seconds 64 bit time = 0.02594 seconds 96 bit time = 0.0229292 seconds 128 bit time = 0.0142151 seconds Multiprecision mp_int256: 32 bit time = 0.00376926 seconds 64 bit time = 0.0102474 seconds 96 bit time = 0.01383 seconds 128 bit time = 0.0192363 seconds

...

large_int<unsigned long long, long long>: 32 bit time = 0.0498975 seconds 64 bit time = 0.111413 seconds 96 bit time = 0.183727 seconds 128 bit time = 0.251469 seconds large_int<large_int<unsigned long long, unsigned long long>, large_int<unsigned long long, long long> >: 32 bit time = 1.08451 seconds 64 bit time = 2.43866 seconds 96 bit time = 3.79658 seconds 128 bit time = 5.20211 seconds

Looking at your code, it is exceptionally terse and clean. The implementation is clear, intuitive and easy to use. In my opinion, the code is excellent! As John has indicated, though, your code really slows down with the bit-by-bit multiplication and division algorithms. As a rule of thumb, big number libraries spend an estimated 80% of their time in the inner loops of the multiplication algorithm---and that's for highly optimized ones. The problem with big number libraries is that you can please, maybe half the people, but merely half of he time. (Is that pleasing only a quarter of us?) You have a clean and terse implementation that could potentially fill a niche between the potential Boost.Multiprecision and bare-metal built-in types. That's two times *potential* in one sentence. In my opinion, to really fill this niche, however, you need high performance for 128, 256, 512 and 1024 bits. What we really need to do in the community is see how we will address the needs for big integers, rationals and floats. How will we transition from bare-metal built-in types through medium sizes like 128, 256, 512 bits, and on to thousands of bits? To what extent do we want to emphasize clarity of implementation? When should we compromise on brevity of code and get down and dirty with limbs, add-and-carry, even (oh no!= assembly hooks, etc. to squeeze performance? I believe that the proposed Boost.Multiprecision has made a good compromise with big integers, rationals and floats by providing a working and relatively efficient solution. John did a lot more work than I did. But the work on big numbers in boost and C++ is far from done! Perhaps we need to work on your code in order to get the performance that users will demand for near-metal types. Perhaps this could potentially plug into the potentially proposed Boost.Multiprecision. Thanks for your work! Best regards, Chris.

Phil Endecott

2 Jul 2 Jul

5:38 p.m.

Hi Michael, Michael Tryhorn wrote:

...

I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

It would have been great if you could have joined the discussion a couple of weeks ago. Perhaps it is not too late for you to write a review of the proposed Boost.Multiprecision, since you clearly have some experience of the subject. I have had a quick look at your code and my initial thoughts are: - The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation. - You're using 2's complement, which I believe is the right choice. - The simplicity of the code means it could be relatively bug-free, if we're lucky. - Bitwise multiplication and division is completely inappropriate. - You have an idiom of working on a temporary and them copying the result to *this to "commit" it, e.g. template<class T, class U> large_int<T, U>& large_int<T, U>::operator++ () { // Take a copy of *this, to modify this_type ret(*this); // Perform the increment ++ret.m_lo; if( ret.m_lo == low_part_type(0) ) { ++ret.m_hi; } // Commit the result return( *this = ret ); } I don't see what the benefit of that is; it just makes ++ much slower than it otherwise would be. This example, operator++, also illustrates what may be a weakness of your recursive style. Consider a 256-bit int implemented using 64-bit built-in ints via intermediate 128-bit ints. The increment does this: increment the 128-bit m_lo: increment the 64-bit m_lo (builtin) test the 64-bit m_lo for zero (builtin) (assume common-case, not zero) test the 128-bit m_lo for zero (not built-in, impl in traits.hpp) test the 64-bit m_hi test the 64-bit m_lo All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value, but it is far from obvious to me that the compiler would successfully optimise it away. On the other hand, if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero. - It would be great to see "hooks" where assembler implementations could be inserted. - It would be interesting to know if your code could function as a "back end" to the proposed Boost.Multiprecision, using that library's expression templates to eliminate temporaries. There are two things that could be learnt from that: does that help performance, and does the Multiprecision front/backend split work? Regards, Phil.

Michael Tryhorn

6:43 p.m.

John, Chris, and anyone else interested, Firstly, to answer a question asked by Marc Glisse, the 'l' prefix on all l[u]int types defined by large_int/def.hpp is included to distinguish the potentially large_int based types from those provided by cstdint or equivalent. So, these 'l' prefixed types could be imported into the current or global namespace without interfering with any current or future built-in equivalents. I'm not a fan of 'using namespace x::y::z;', but I know some people do use it. Now onto testing. I've tried the given example code upon my local test machine and can corroborate your results. I'm impressed - the performance of operator* at least is astounding (even for 1024bit integers). I have tried tweaking my classes here-and-there, and have even managed to halve the time it takes large_int::operator* to run (by adding special 'left_shift_by_one' and 'right_shift_by_one' private operations). I agree that what is needed is a replacement of the algorihms rather than their tweaking, as the growth rate against number-of-bits is rather high. It seems that other operators like construction/destruction operator+ and operator- are comparable on speed. This is approximate, but I observed large_int being quicker on simple operations like '+', '-' and so on at 128bit, equivalent at 256bit and slower at 1024. But, considering the overall speed of both Boost.LargeInt and Boost.Multiprecision in this respect, I doubt that either would ever be seen as a bottleneck in any real application. However, I do now have a couple of genuine points of difference between Boost.Multiprecision and Boost.LargeInt as it is now: * Object sizes - you seem to be adding a pointer to all integers (presumably as a link between the front and back ends). I can understand the need for it, but this does increase the required memory to store arrays of them. As boost::large_int::large_int classes are fully concrete and only ever consist of their two internal parts, they have the same size as what would be their built-in equivalent. Meaning, sizeof( int128_t ) == sizeof( lint128_t ), unless the compiler decides to add class padding but I've never seen padding added to a large_int object. * This also leads onto forward compatibility and interoperability. If we were to serialize a large_int instance, it should have the same content as its equivalent built-in type. This aids transparency of use, although it is currently only true upon little-endian machines. Big-endian support could be added by swapping the internal m_hi and m_lo large_int data members and adding some 'ntohl'/'htonl' equivalent free-functions in the boost::large_int namespace. In conclusion, it is obvious that we are in agreement that support for arbitary sized integers for C++ is required. I still believe that Boost.LargeInt is a good contender. It is has both the above points and an elegant design (even if I do say so myself), and would just need improvements to its integral division and multiplication. Unfortunately, that is beyond my expertise, hence the current "simple but it works" solution. If you have any suggestions on how this could be improved or merged into existing work, I would be more than happy to collaborate. Any further questions or comments would of course be appreciated. (Phil, I've seen your email and I will respond to it separately.) Thanks, Michael J. Tryhorn P.S. I couldn't get the above testing code to run on either of my local 64bit machines, and had to resort to an old 32bit AMD machine which is slowly gathering dust. This is after inserting Boost.Multiprecision into Boost v1.50.0. System: Ubuntu 11.10 (64-bit) Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4 Compiler: gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 Command: g++ -Wall -ggdb -O3 -march=native -I<snip>/include -L<snip>/lib \ -o lint_tst lint_tst.cpp -Wl,-Bstatic -lboost_chrono -lboost_system \ -Wl,-Bdynamic -lrt Output: 32 bit time = 0.00177922 seconds 64 bit time = 0.00188884 seconds lint_tst: <snip>/cpp_int.hpp:1090: void boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true, Allocator = void]: Assertion `(std::numeric_limits<double_limb_type>::max)() - carry > static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value) * static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value)' failed. System: openSUSE 12.1 (64-bit) Intel® Xeon® CPU E5410 @ 2.33GHz × 4 Compiler: gcc (SUSE Linux) 4.6.2 Command: [same as above] Output: 32 bit time = 0.00183981 seconds 64 bit time = 0.00189349 seconds lint_tst: <snip>/cpp_int.hpp:1090: void boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true, Allocator = void]: Assertion `(std::numeric_limits<double_limb_type>::max)() - carry > static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value) * static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value)' failed. (Same failure at other optimization levels, including O0, O1 and O2.) This failure does not occur when running the same test on 32bit openSUSE 12.1, and so may be integer-size related. Here are the sizes of the built-in types, for reference: openSUSE 32bit: sizeof( short ) == 2 sizeof( int ) == 4 sizeof( long ) == 4 sizeof( long long ) == 8 sizeof( float ) == 4 sizeof( double ) == 8 sizeof( long double ) == 12 sizeof( mp_uint128_t ) == 20 sizeof( mp_uint256_t ) == 36 sizeof( mp_uint512_t ) == 68 sizeof( mp_uint1024_t ) == 132 openSUSE/Ubuntu 64bit: sizeof( short ) == 2 sizeof( int ) == 4 sizeof( long ) == 8 sizeof( long long ) == 8 sizeof( float ) == 4 sizeof( double ) == 8 sizeof( long double ) == 16 sizeof( mp_uint128_t ) == 24 sizeof( mp_uint256_t ) == 40 sizeof( mp_uint512_t ) == 72 sizeof( mp_uint1024_t ) == 136 On 2 July 2012 18:38, Phil Endecott <spam_from_boost_dev@chezphil.org>wrote:

...

Hi Michael,

Michael Tryhorn wrote:

...
I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

It would have been great if you could have joined the discussion a couple of weeks ago. Perhaps it is not too late for you to write a review of the proposed Boost.Multiprecision, since you clearly have some experience of the subject.

I have had a quick look at your code and my initial thoughts are:

- The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.

- You're using 2's complement, which I believe is the right choice.

- The simplicity of the code means it could be relatively bug-free, if we're lucky.

- Bitwise multiplication and division is completely inappropriate.

- You have an idiom of working on a temporary and them copying the result to *this to "commit" it, e.g.

template<class T, class U> large_int<T, U>& large_int<T, U>::operator++ ()

{ // Take a copy of *this, to modify this_type ret(*this);

// Perform the increment ++ret.m_lo; if( ret.m_lo == low_part_type(0) )

{ ++ret.m_hi; }

// Commit the result return( *this = ret ); }

I don't see what the benefit of that is; it just makes ++ much slower than it otherwise would be.

This example, operator++, also illustrates what may be a weakness of your recursive style. Consider a 256-bit int implemented using 64-bit built-in ints via intermediate 128-bit ints. The increment does this:

increment the 128-bit m_lo: increment the 64-bit m_lo (builtin) test the 64-bit m_lo for zero (builtin) (assume common-case, not zero) test the 128-bit m_lo for zero (not built-in, impl in traits.hpp) test the 64-bit m_hi test the 64-bit m_lo

All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value, but it is far from obvious to me that the compiler would successfully optimise it away. On the other hand, if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero.

- It would be great to see "hooks" where assembler implementations could be inserted.

- It would be interesting to know if your code could function as a "back end" to the proposed Boost.Multiprecision, using that library's expression templates to eliminate temporaries. There are two things that could be learnt from that: does that help performance, and does the Multiprecision front/backend split work?

Regards, Phil.

______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

Michael Tryhorn

7:54 p.m.

Phil/everyone,

...

It would have been great if you could have joined the discussion a couple of weeks ago.

I wish I had!

...

The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.

I understand your trepidation, but really there should be no reason why binary I/O would not work provided that the bottom-level types are built-in integers or an equivalent, of the wordsize or greater. If smaller integer types are mixed-in, the compiler could add padding. There is also the question of big or little endian, which will affect the needed declaration order.

...

You're using 2's complement, which I believe is the right choice.

Agreed. It both makes the calculations simpler and makes large_int closer to the built-in types.

...

The simplicity of the code means it could be relatively bug-free, if we're lucky.

This is my goal in whatever I write, as frankly I don't like testing! However, Boost.LargeInt has a complete test suite and I have tried to attain 100% coverage therein.

...

Bitwise multiplication and division is completely inappropriate.

Agreed, but used for lack of knowledge of a better alternative. I'd be happy to collaborate with anyone more experienced in this particular area.

...

You have an idiom of working on a temporary and them copying the result to *this to "commit" it ...

Actually, incredulous as it may seem, at high levels of optimisation ('O3' in GCC), this sort of thing *can* produce faster code. My guess is that it's to do with registers vs. stack, but without picking through the assembler I cannot be sure. It won't at lower optimization levels, however. But, I didn't add it for that. I added it for the improved exception-safety where non built-in types are used as the high and low parts of a large_int. With copy-and-commit, if any operator other than assignment throws an exception, the value of the underlying large_int would not change. As everyone here will no doubt know: built-in integers don't throw, so I may well revert to a non copy-and-commit style here.

...

Consider a 256-bit int implemented using 64-bit built-in ints via inter- mediate 128-bit ints ... All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value ... if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero.

This is another thing that would have to be tested. Unfortunately it's nigh-on impossible to know whether a version with or without more branching ('if' or 'for') would be faster without trying it, and even then on multiple target architectures.

...

It would be great to see "hooks" where assembler implementations could be inserted.

Nice idea. This could be done in a similar manner to Boost.Multiprecision's front and back-end implementation. However it would have to be a templatized, compile-time decision for the programmer in order to retain the binary- compatibility with built-in integer types.

...

It would be interesting to know if your code could function as a "back end" to the proposed Boost.Multiprecision, using that library's expression templates to eliminate temporaries. There are two things that could be learnt from that: does that help performance, and does the Multiprecision front/backend split work?

I would have to study Boost.Multiprecision in detail for this. But yes, interesting idea. Thanks, Mike On 2 July 2012 19:43, Michael Tryhorn <mjtryhorn@gmail.com> wrote:

...

John, Chris, and anyone else interested,

Firstly, to answer a question asked by Marc Glisse, the 'l' prefix on all l[u]int types defined by large_int/def.hpp is included to distinguish the potentially large_int based types from those provided by cstdint or equivalent. So, these 'l' prefixed types could be imported into the current or global namespace without interfering with any current or future built-in equivalents. I'm not a fan of 'using namespace x::y::z;', but I know some people do use it.

Now onto testing. I've tried the given example code upon my local test machine and can corroborate your results. I'm impressed - the performance of operator* at least is astounding (even for 1024bit integers). I have tried tweaking my classes here-and-there, and have even managed to halve the time it takes large_int::operator* to run (by adding special 'left_shift_by_one' and 'right_shift_by_one' private operations). I agree that what is needed is a replacement of the algorihms rather than their tweaking, as the growth rate against number-of-bits is rather high.

It seems that other operators like construction/destruction operator+ and operator- are comparable on speed. This is approximate, but I observed large_int being quicker on simple operations like '+', '-' and so on at 128bit, equivalent at 256bit and slower at 1024. But, considering the overall speed of both Boost.LargeInt and Boost.Multiprecision in this respect, I doubt that either would ever be seen as a bottleneck in any real application.

However, I do now have a couple of genuine points of difference between Boost.Multiprecision and Boost.LargeInt as it is now:

* Object sizes - you seem to be adding a pointer to all integers (presumably as a link between the front and back ends). I can understand the need for it, but this does increase the required memory to store arrays of them. As boost::large_int::large_int classes are fully concrete and only ever consist of their two internal parts, they have the same size as what would be their built-in equivalent. Meaning, sizeof( int128_t ) == sizeof( lint128_t ), unless the compiler decides to add class padding but I've never seen padding added to a large_int object.

* This also leads onto forward compatibility and interoperability. If we were to serialize a large_int instance, it should have the same content as its equivalent built-in type. This aids transparency of use, although it is currently only true upon little-endian machines. Big-endian support could be added by swapping the internal m_hi and m_lo large_int data members and adding some 'ntohl'/'htonl' equivalent free-functions in the boost::large_int namespace.

In conclusion, it is obvious that we are in agreement that support for arbitary sized integers for C++ is required. I still believe that Boost.LargeInt is a good contender. It is has both the above points and an elegant design (even if I do say so myself), and would just need improvements to its integral division and multiplication. Unfortunately, that is beyond my expertise, hence the current "simple but it works" solution.

If you have any suggestions on how this could be improved or merged into existing work, I would be more than happy to collaborate. Any further questions or comments would of course be appreciated.

(Phil, I've seen your email and I will respond to it separately.)

Thanks,

Michael J. Tryhorn

P.S. I couldn't get the above testing code to run on either of my local 64bit machines, and had to resort to an old 32bit AMD machine which is slowly gathering dust. This is after inserting Boost.Multiprecision into Boost v1.50.0.

System: Ubuntu 11.10 (64-bit) Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4 Compiler: gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 Command: g++ -Wall -ggdb -O3 -march=native -I<snip>/include -L<snip>/lib \ -o lint_tst lint_tst.cpp -Wl,-Bstatic -lboost_chrono -lboost_system \ -Wl,-Bdynamic -lrt Output: 32 bit time = 0.00177922 seconds 64 bit time = 0.00188884 seconds lint_tst: <snip>/cpp_int.hpp:1090: void boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true, Allocator = void]: Assertion `(std::numeric_limits<double_limb_type>::max)() - carry > static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value) * static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value)' failed.

System: openSUSE 12.1 (64-bit) Intel® Xeon® CPU E5410 @ 2.33GHz × 4 Compiler: gcc (SUSE Linux) 4.6.2 Command: [same as above] Output: 32 bit time = 0.00183981 seconds 64 bit time = 0.00189349 seconds lint_tst: <snip>/cpp_int.hpp:1090: void boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true, Allocator = void]: Assertion `(std::numeric_limits<double_limb_type>::max)() - carry > static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value) * static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value)' failed.

(Same failure at other optimization levels, including O0, O1 and O2.)

This failure does not occur when running the same test on 32bit openSUSE 12.1, and so may be integer-size related. Here are the sizes of the built-in types, for reference:

openSUSE 32bit: sizeof( short ) == 2 sizeof( int ) == 4 sizeof( long ) == 4 sizeof( long long ) == 8 sizeof( float ) == 4 sizeof( double ) == 8 sizeof( long double ) == 12 sizeof( mp_uint128_t ) == 20 sizeof( mp_uint256_t ) == 36 sizeof( mp_uint512_t ) == 68 sizeof( mp_uint1024_t ) == 132

openSUSE/Ubuntu 64bit: sizeof( short ) == 2 sizeof( int ) == 4 sizeof( long ) == 8 sizeof( long long ) == 8 sizeof( float ) == 4 sizeof( double ) == 8 sizeof( long double ) == 16 sizeof( mp_uint128_t ) == 24 sizeof( mp_uint256_t ) == 40 sizeof( mp_uint512_t ) == 72 sizeof( mp_uint1024_t ) == 136

On 2 July 2012 18:38, Phil Endecott <spam_from_boost_dev@chezphil.org>wrote:

...
Hi Michael,

Michael Tryhorn wrote:

...
I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

It would have been great if you could have joined the discussion a couple of weeks ago. Perhaps it is not too late for you to write a review of the proposed Boost.Multiprecision, since you clearly have some experience of the subject.

I have had a quick look at your code and my initial thoughts are:

- The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.

- You're using 2's complement, which I believe is the right choice.

- The simplicity of the code means it could be relatively bug-free, if we're lucky.

- Bitwise multiplication and division is completely inappropriate.

- You have an idiom of working on a temporary and them copying the result to *this to "commit" it, e.g.

template<class T, class U> large_int<T, U>& large_int<T, U>::operator++ ()

{ // Take a copy of *this, to modify this_type ret(*this);

// Perform the increment ++ret.m_lo; if( ret.m_lo == low_part_type(0) )

{ ++ret.m_hi; }

// Commit the result return( *this = ret ); }

I don't see what the benefit of that is; it just makes ++ much slower than it otherwise would be.

This example, operator++, also illustrates what may be a weakness of your recursive style. Consider a 256-bit int implemented using 64-bit built-in ints via intermediate 128-bit ints. The increment does this:

increment the 128-bit m_lo: increment the 64-bit m_lo (builtin) test the 64-bit m_lo for zero (builtin) (assume common-case, not zero) test the 128-bit m_lo for zero (not built-in, impl in traits.hpp) test the 64-bit m_hi test the 64-bit m_lo

All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value, but it is far from obvious to me that the compiler would successfully optimise it away. On the other hand, if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero.

- It would be great to see "hooks" where assembler implementations could be inserted.

- It would be interesting to know if your code could function as a "back end" to the proposed Boost.Multiprecision, using that library's expression templates to eliminate temporaries. There are two things that could be learnt from that: does that help performance, and does the Multiprecision front/backend split work?

Regards, Phil.

______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>

John Maddock

3 Jul 3 Jul

8:38 a.m.

...

The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.

...

I understand your trepidation, but really there should be no reason why binary I/O would not work provided that the bottom-level types are built-in integers or an equivalent, of the wordsize or greater. If smaller integer types are mixed-in, the compiler could add padding. There is also the question of big or little endian, which will affect the needed declaration order.

I think chasing binary IO is a fools errand - if you expect the result to be compatible with builtin integers that is: 1) There's no guarantee that built in types are twos complement. 2) Larger than a single register integers are synthesised by the compiler - in effect these are compiler supplied "big ints" - as such both the byte order and limb order are completely arbitrary. For example on a 16 bit processor (the simplest case), there are two choices for the byte order within a 16 bit word, and then another two choices for the word order within compiler synthesised 32-bit type. Unless you're going to write and test code that handles all possibilities, and figuire out how to configure it so it does the right thing on all platforms (including all the weird 8 and 16 bit embedded platforms out there). If you're sensible and don't shoot for that, then there's no difference between the Multiprecision fixed int's and 2's complement ones: both are POD's and can be treated as a platform and compiler specific "bag of bits".

...

...
You're using 2's complement, which I believe is the right choice.

Agreed. It both makes the calculations simpler and makes large_int closer to the built-in types.

Not so fast: I started off using 2's complement arithmetic for fixed integer types and was persuaded to change to sign-magnitude by some tests the Boost.Polygon guys came up with. Consider multiplication - for any sensible implementation for smallish int's the complexity is O(N^2), for the Multiprecision lib's fixed int's, N is the number of limbs that are actually used. For a packed 2's complement type, N is the total number of limbs in the representation. That means that: * For the Multiprecision lib, runtime of multiplication is independent of the size of the type - it depends only on the magnitutde of the number being stored - so for example 2 * 3 takes (roughly) the size of the type used - whether it's a 128, 256 or 2048-bit int. * In contrast for a packed 2's complement type the complexity of multiplication and division grows steaply with increasing bit size. Now having said that, the sign-magnitude version is more complex, and more expensive to construct and copy - not by much - but it shows up in tests such as the delaunay code that Phil posted where a fair proportion of time is spent constructing the large int's just for one quick calculation. It's an important use case though, and one I need to come back to. BTW, I've mentioned this a couple of times but it seems to have slipped under the radar - the Multiprecision lib already has a packed 2's complement fixed int back end, it's just that I removed it for the review because I didn't think it added much. <Shrug> I guess.

...

Consider a 256-bit int implemented using 64-bit built-in ints via inter- mediate 128-bit ints ... All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value ... if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero.

...

This is another thing that would have to be tested. Unfortunately it's nigh-on impossible to know whether a version with or without more branching ('if' or 'for') would be faster without trying it, and even then on multiple target architectures.

I fear Phil is asking the impossible here - you can have something really simple which is fast in the trivial case (one stop larger than long long) but rapidly slows for larger cases, or you can do what the Multiprecision lib does and add all the special case handling which can give order-of-magnitude improvements for 1024 bits and up, but slows things down due to more branching by ~50% in the trivial case. Sometimes you just have to make a choice.

...

* Object sizes - you seem to be adding a pointer to all integers (presumably as a link between the front and back ends).

No, no such pointers anywhere. As mentioned about there is a packed 2's complement fixed int backend in the sandbox, it just wasn't part of the review (and yes it's size is exactly N bits, no pointers or other stray members).

...

* This also leads onto forward compatibility and interoperability. If we were

I don't believe so - remember that "larger than single register" types are synthesized by the compiler, so the limb order they use is completely dependent upon the whim of the compiler writer. Thinking you can be forwards compatible with types that haven't been written yet is a fools errand, even if one would hope that compiler writers all do something sensible ;-) John. PS will investigate the assertion you spotted later when I can get 64-bit Linux back up...

Christopher Kormanyos

2 Jul 2 Jul

10:21 p.m.

...

John, Chris, and anyone else interested,

<snip>

...

In conclusion, it is obvious that we are in agreement that support for arbitrary sized integers for C++ is required. I still believe that Boost.LargeInt is a good contender. It is has both the above points and an elegant design (even if I do say so myself), and would just need improvements to its integral division and multiplication. Unfortunately, that is beyond my expertise, hence the current "simple but it works" solution.

Thank you for your detailed post, Michael. In my opinion, medium integers from 64 through, say, 1024 or 2048 bits are a niche needing explicit treatment. This is not in conflict with the proposed Boost.Multiprecision, rather an extension of a potential functionality of boost. The trick is finding the right architecture, granularity, interoperability and performance in order to knit these together in a cohesive big-number implementation. First off, we need to sort out the potential Boost.Multiprecision, as I believe this is headed in the right direction. Then we need to understand how to address the high performance needs of medium-sized integers.

...

If you have any suggestions on how this could be improved or merged into existing work, I would be more than happy to collaborate. Any further questions or comments would of course be appreciated.

For medium sized integers, build the integer type from four 32-bit radix-2 components. Do not use allocation. Use naive multiply, etc. Use the Knuth algorithm for long division. Add and sub are linear in the limb count. Mul-by-long-long and div-by-long-long are of linear complexity. But you will do no more than repeat what John did for his cpp_int back ends. He even got the performance more than up-to-par with move semantics and expression templates. Well, If we ever want to get bad-assed performance, we would have to team up in a coalition of programmers. I include a bibliography below. Nonetheless, nobody can roll this one alone---Nobody! For your interests, the first order of business would be to make a dedicated medium integer library with the desired interoperability that significantly beats the performance of Boost.Multiprecision off-the-rack and matches the performance of GMP. The second (and much larger goal) is to package everything in a palatable form for everyone---no small task. Here, I mean medium sized integers, multiprecision integers, rationals and floats, as well as fixed-point in a sensible standardized form for all users. This is a massive task. I'm booked right now---solid, booked out 'till autumn. I encourage you to keep going. In my opinion, though, we really need to clear up the potential Boost.Multiprecision now and find out how to deal with medium integers thereafter. Best regards, Chris. P.S. Somebody will nag on you about so-called "top-posting" soon. Just like they nagged me on it in the beginning. Bibliography from anothe post: Here is the Bibliography for much of my R&D on big numbs. Best regards, Chris J. Arndt and C. Haenel "Pi Unleashed" Springer, Heidelberg, 2001 ISBN 3-540-66572-2 F. Bornemann, D. Laurie, S. Wagon and J. Waldvogel, "The SIAM 100-Digit Challenge: A Study in High-Accuracy Numerical Computing" Society for Industrial and Applied Mathematics SIAM (TM), Philadelphia, 2004. ISBN 0-898-71561-X J. M. Borwein and P. Borwein "Pi and the AGM: A Study in Analytic Number Theory and Computational Complexity" John Wiley, New York, 1998 ISBN 0-471-83138-7, Paperback R. Brent and P. Zimmermann "Modern Computer Arithmetic" Cambridge University Press, Cambridge, 2011 ISBN 978-0-521-19469-3 L. Fousse, G. Hanrot, V. Lefevre, P. Pelissier and P. Zimmermann "MPFR: A Multiple-Precision Binary Floating-Point Library With Correct Rounding" ACM ACM Trans. Math. Soft., Vol. 33, Issue 2 ACM, 2007 A. Gil, J. Segura and N. M. Temme, "Numerical Methods for Special Functions" Society for Industrial and Applied Mathematics SIAM (TM), Philadelphia, 2007 ISBN 978-0-898-71634-4 J. M. Muller "Elementary Functions: Algorithms and Implementation", 2nd Edn. Birkhaeuser, Boston, 2006 ISBN 978-0-817-64372-0 J. M. Muller, N. Brisebarre, F. de Dinechin, C. M. Jeannerod, V. Lefevre, G. Melquiond, N. Revol, D. Stehle and T. Torres "Handbook of Floating-Point Arithmetic" Birkhaeuser, Boston, 2010 ISBN 978-0-817-64704-9 C. M. Kormanyos "Algorithm 910: A Portable C++ Multiple-Precision System for Special-Function Calculations" ACM ACM Trans. Math. Soft., Vol. 37, Issue 4 ACM, 2011 D. E. Knuth "The Art of Computer Programming, Volume 3 Seminumerical Algorithms", 3rd edn. Addison Wesley, 1998

Simonson, Lucanus J

3 Jul 3 Jul

8:12 p.m.

From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Christopher Kormanyos

...

In my opinion, medium integers from 64 through, say, 1024 or 2048 bits are a niche needing explicit treatment.

Please take a look at this library: http://www.mentor.com/esl/catapult/algorithmic "Algorithmic C (AC) datatypes are a class-based C++ library that provides arbitrary-length integer, fixed-point and complex data types. They enable algorithm, system and hardware designers to precisely model bit-true behavior in C++ specifications while accelerating simulation speeds by 10-200x faster versus alternate datatypes." I believe this is an example of the explicit treatment you are referring to. This is the best open source example of such that I know of. Regards, Luke

Christopher Kormanyos

8:46 p.m.

...

...
In my opinion, medium integers from 64 through, say, 1024 or 2048 bits are a niche needing explicit treatment.

...

Please take a look at this library: http://www.mentor.com/esl/catapult/algorithmic

...

"Algorithmic C (AC) datatypes are a class-based C++ library that provides arbitrary-length integer, fixed-point and complex data types. They enable algorithm, system and hardware designers to precisely model bit-true behavior in C++ specifications while accelerating simulation speeds by 10-200x faster versus alternate datatypes."

...

I believe this is an example of the explicit treatment you are referring to. This is the best open source example of such that I know of.

...

Regards, Luke

Wow, Thanks Luke! That sure looks good based on the description. I will take a closer look later as time allows. Best regards, Chris.

Christopher Kormanyos

9:05 p.m.

...

...
I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.

...

It would have been great if you could have joined the discussion a couple of weeks ago. Perhaps it is not too late for you to write a review of the proposed Boost.Multiprecision, since you clearly have some experience of the subject.

That would be good. This is such a rich and multifaceted topic---fast big integers, floats, etc. Remember, we're far from done with big integers and floats in boost and C++. It's tough nut to crack. But in my opinion, we seem to be making progress.

...

- The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.

In the past, I wrote both kinds: classes based on recursive templates as well as classes based on traditional arrays of limbs. Both the performance as well as the resulting code size of the recursive template versions were disappointing compared with traditional arrays of *limbs*. I really can't make a time commitment now. But in the fall, I'd be happy to give you any codes that I have or discuss performance or size issues within my modest range of experience. Best regards, Chris.

4770

Age (days ago)

4773

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

Christopher Kormanyos
John Maddock
Marc Glisse
Michael Tryhorn
Phil Endecott
Rafaël Fourquet
Simonson, Lucanus J
Steven Watanabe
Vicente J. Botet Escriba