
Phil/everyone,
It would have been great if you could have joined the discussion a couple of weeks ago.
I wish I had!
The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.
I understand your trepidation, but really there should be no reason why binary I/O would not work provided that the bottom-level types are built-in integers or an equivalent, of the wordsize or greater. If smaller integer types are mixed-in, the compiler could add padding. There is also the question of big or little endian, which will affect the needed declaration order.
You're using 2's complement, which I believe is the right choice.
Agreed. It both makes the calculations simpler and makes large_int closer to the built-in types.
The simplicity of the code means it could be relatively bug-free, if we're lucky.
This is my goal in whatever I write, as frankly I don't like testing! However, Boost.LargeInt has a complete test suite and I have tried to attain 100% coverage therein.
Bitwise multiplication and division is completely inappropriate.
Agreed, but used for lack of knowledge of a better alternative. I'd be happy to collaborate with anyone more experienced in this particular area.
You have an idiom of working on a temporary and them copying the result to *this to "commit" it ...
Actually, incredulous as it may seem, at high levels of optimisation ('O3' in GCC), this sort of thing *can* produce faster code. My guess is that it's to do with registers vs. stack, but without picking through the assembler I cannot be sure. It won't at lower optimization levels, however. But, I didn't add it for that. I added it for the improved exception-safety where non built-in types are used as the high and low parts of a large_int. With copy-and-commit, if any operator other than assignment throws an exception, the value of the underlying large_int would not change. As everyone here will no doubt know: built-in integers don't throw, so I may well revert to a non copy-and-commit style here.
Consider a 256-bit int implemented using 64-bit built-in ints via inter- mediate 128-bit ints ... All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value ... if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero.
This is another thing that would have to be tested. Unfortunately it's nigh-on impossible to know whether a version with or without more branching ('if' or 'for') would be faster without trying it, and even then on multiple target architectures.
It would be great to see "hooks" where assembler implementations could be inserted.
Nice idea. This could be done in a similar manner to Boost.Multiprecision's front and back-end implementation. However it would have to be a templatized, compile-time decision for the programmer in order to retain the binary- compatibility with built-in integer types.
It would be interesting to know if your code could function as a "back end" to the proposed Boost.Multiprecision, using that library's expression templates to eliminate temporaries. There are two things that could be learnt from that: does that help performance, and does the Multiprecision front/backend split work?
I would have to study Boost.Multiprecision in detail for this. But yes, interesting idea. Thanks, Mike On 2 July 2012 19:43, Michael Tryhorn <mjtryhorn@gmail.com> wrote:
John, Chris, and anyone else interested,
Firstly, to answer a question asked by Marc Glisse, the 'l' prefix on all l[u]int types defined by large_int/def.hpp is included to distinguish the potentially large_int based types from those provided by cstdint or equivalent. So, these 'l' prefixed types could be imported into the current or global namespace without interfering with any current or future built-in equivalents. I'm not a fan of 'using namespace x::y::z;', but I know some people do use it.
Now onto testing. I've tried the given example code upon my local test machine and can corroborate your results. I'm impressed - the performance of operator* at least is astounding (even for 1024bit integers). I have tried tweaking my classes here-and-there, and have even managed to halve the time it takes large_int::operator* to run (by adding special 'left_shift_by_one' and 'right_shift_by_one' private operations). I agree that what is needed is a replacement of the algorihms rather than their tweaking, as the growth rate against number-of-bits is rather high.
It seems that other operators like construction/destruction operator+ and operator- are comparable on speed. This is approximate, but I observed large_int being quicker on simple operations like '+', '-' and so on at 128bit, equivalent at 256bit and slower at 1024. But, considering the overall speed of both Boost.LargeInt and Boost.Multiprecision in this respect, I doubt that either would ever be seen as a bottleneck in any real application.
However, I do now have a couple of genuine points of difference between Boost.Multiprecision and Boost.LargeInt as it is now:
* Object sizes - you seem to be adding a pointer to all integers (presumably as a link between the front and back ends). I can understand the need for it, but this does increase the required memory to store arrays of them. As boost::large_int::large_int classes are fully concrete and only ever consist of their two internal parts, they have the same size as what would be their built-in equivalent. Meaning, sizeof( int128_t ) == sizeof( lint128_t ), unless the compiler decides to add class padding but I've never seen padding added to a large_int object.
* This also leads onto forward compatibility and interoperability. If we were to serialize a large_int instance, it should have the same content as its equivalent built-in type. This aids transparency of use, although it is currently only true upon little-endian machines. Big-endian support could be added by swapping the internal m_hi and m_lo large_int data members and adding some 'ntohl'/'htonl' equivalent free-functions in the boost::large_int namespace.
In conclusion, it is obvious that we are in agreement that support for arbitary sized integers for C++ is required. I still believe that Boost.LargeInt is a good contender. It is has both the above points and an elegant design (even if I do say so myself), and would just need improvements to its integral division and multiplication. Unfortunately, that is beyond my expertise, hence the current "simple but it works" solution.
If you have any suggestions on how this could be improved or merged into existing work, I would be more than happy to collaborate. Any further questions or comments would of course be appreciated.
(Phil, I've seen your email and I will respond to it separately.)
Thanks,
Michael J. Tryhorn
P.S. I couldn't get the above testing code to run on either of my local 64bit machines, and had to resort to an old 32bit AMD machine which is slowly gathering dust. This is after inserting Boost.Multiprecision into Boost v1.50.0.
System: Ubuntu 11.10 (64-bit) Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4 Compiler: gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 Command: g++ -Wall -ggdb -O3 -march=native -I<snip>/include -L<snip>/lib \ -o lint_tst lint_tst.cpp -Wl,-Bstatic -lboost_chrono -lboost_system \ -Wl,-Bdynamic -lrt Output: 32 bit time = 0.00177922 seconds 64 bit time = 0.00188884 seconds lint_tst: <snip>/cpp_int.hpp:1090: void boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true, Allocator = void]: Assertion `(std::numeric_limits<double_limb_type>::max)() - carry > static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value) * static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value)' failed.
System: openSUSE 12.1 (64-bit) Intel® Xeon® CPU E5410 @ 2.33GHz × 4 Compiler: gcc (SUSE Linux) 4.6.2 Command: [same as above] Output: 32 bit time = 0.00183981 seconds 64 bit time = 0.00189349 seconds lint_tst: <snip>/cpp_int.hpp:1090: void boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&, const boost::multiprecision::backends::cpp_int_backend<MinBits, Signed, Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true, Allocator = void]: Assertion `(std::numeric_limits<double_limb_type>::max)() - carry > static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value) * static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed, Allocator>::max_limb_value)' failed.
(Same failure at other optimization levels, including O0, O1 and O2.)
This failure does not occur when running the same test on 32bit openSUSE 12.1, and so may be integer-size related. Here are the sizes of the built-in types, for reference:
openSUSE 32bit: sizeof( short ) == 2 sizeof( int ) == 4 sizeof( long ) == 4 sizeof( long long ) == 8 sizeof( float ) == 4 sizeof( double ) == 8 sizeof( long double ) == 12 sizeof( mp_uint128_t ) == 20 sizeof( mp_uint256_t ) == 36 sizeof( mp_uint512_t ) == 68 sizeof( mp_uint1024_t ) == 132
openSUSE/Ubuntu 64bit: sizeof( short ) == 2 sizeof( int ) == 4 sizeof( long ) == 8 sizeof( long long ) == 8 sizeof( float ) == 4 sizeof( double ) == 8 sizeof( long double ) == 16 sizeof( mp_uint128_t ) == 24 sizeof( mp_uint256_t ) == 40 sizeof( mp_uint512_t ) == 72 sizeof( mp_uint1024_t ) == 136
On 2 July 2012 18:38, Phil Endecott <spam_from_boost_dev@chezphil.org>wrote:
Hi Michael,
Michael Tryhorn wrote:
I have written a Boost-compatible library for the representation and manipulation of integers of large or arbitrary size (including 128bit and above). I would like to make a preliminary submission of this library for your review and input, for potential inclusion into Boost v1.51.0 or later, as per the Boost Library Submission Process.
It would have been great if you could have joined the discussion a couple of weeks ago. Perhaps it is not too late for you to write a review of the proposed Boost.Multiprecision, since you clearly have some experience of the subject.
I have had a quick look at your code and my initial thoughts are:
- The storage of these types is just the actual numeric data, so I should be able to do binary I/O on them etc., which is good. I would perhaps be a bit more confident of this if it were just an array of int32s or int64s, rather than the current recursive organisation.
- You're using 2's complement, which I believe is the right choice.
- The simplicity of the code means it could be relatively bug-free, if we're lucky.
- Bitwise multiplication and division is completely inappropriate.
- You have an idiom of working on a temporary and them copying the result to *this to "commit" it, e.g.
template<class T, class U> large_int<T, U>& large_int<T, U>::operator++ ()
{ // Take a copy of *this, to modify this_type ret(*this);
// Perform the increment ++ret.m_lo; if( ret.m_lo == low_part_type(0) )
{ ++ret.m_hi; }
// Commit the result return( *this = ret ); }
I don't see what the benefit of that is; it just makes ++ much slower than it otherwise would be.
This example, operator++, also illustrates what may be a weakness of your recursive style. Consider a 256-bit int implemented using 64-bit built-in ints via intermediate 128-bit ints. The increment does this:
increment the 128-bit m_lo: increment the 64-bit m_lo (builtin) test the 64-bit m_lo for zero (builtin) (assume common-case, not zero) test the 128-bit m_lo for zero (not built-in, impl in traits.hpp) test the 64-bit m_hi test the 64-bit m_lo
All of second half of that is redundant once you know that the least significant 64 bits incremented to a non-zero value, but it is far from obvious to me that the compiler would successfully optimise it away. On the other hand, if the implementation were just an array of 4 (u)int64_ts, you would iterate with a for loop and break when non-zero.
- It would be great to see "hooks" where assembler implementations could be inserted.
- It would be interesting to know if your code could function as a "back end" to the proposed Boost.Multiprecision, using that library's expression templates to eliminate temporaries. There are two things that could be learnt from that: does that help performance, and does the Multiprecision front/backend split work?
Regards, Phil.
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>