Re: [boost] Large Integer Library

2 Jul 2012

      Phil/everyone,
...
It would have been great if you could have joined the discussion a couple
of weeks ago.
I wish I had!
...
The storage of these types is just the actual numeric data, so I should be
able to do binary I/O on them etc., which is good.  I would perhaps be a
bit
more confident of this if it were just an array of int32s or int64s,
rather
than the current recursive organisation.
I understand your trepidation, but really there should be no reason why
binary
I/O would not work provided that the bottom-level types are built-in
integers
or an equivalent, of the wordsize or greater.  If smaller integer types are
mixed-in, the compiler could add padding.  There is also the question of big
or little endian, which will affect the needed declaration order.
...
You're using 2's complement, which I believe is the right choice.
Agreed.  It both makes the calculations simpler and makes large_int closer
to
the built-in types.
...
The simplicity of the code means it could be relatively bug-free, if
we're lucky.
This is my goal in whatever I write, as frankly I don't like testing!
However,
Boost.LargeInt has a complete test suite and I have tried to attain 100%
coverage
therein.
...
Bitwise multiplication and division is completely inappropriate.
Agreed, but used for lack of knowledge of a better alternative.  I'd be
happy
to collaborate with anyone more experienced in this particular area.
...
You have an idiom of working on a temporary and them copying the result to
*this to "commit" it ...
Actually, incredulous as it may seem, at high levels of optimisation ('O3'
in
GCC), this sort of thing *can* produce faster code.  My guess is that it's
to
do with registers vs. stack, but without picking through the assembler I
cannot
be sure.  It won't at lower optimization levels, however.

But, I didn't add it for that.  I added it for the improved exception-safety
where non built-in types are used as the high and low parts of a large_int.
With copy-and-commit, if any operator other than assignment throws an
exception, the value of the underlying large_int would not change.  As
everyone here will no doubt know: built-in integers don't throw, so I may
well
revert to a non copy-and-commit style here.
...
Consider a 256-bit int implemented using 64-bit built-in ints via inter-
mediate 128-bit ints ... All of second half of that is redundant once you
know that the least significant 64 bits incremented to a non-zero value
...
if the implementation were just an array of 4 (u)int64_ts,
you would iterate with a for loop and break when non-zero.
This is another thing that would have to be tested.  Unfortunately it's
nigh-on
impossible to know whether a version with or without more branching ('if' or
'for') would be faster without trying it, and even then on multiple target
architectures.
...
It would be great to see "hooks" where assembler implementations
could be inserted.
Nice idea.  This could be done in a similar manner to Boost.Multiprecision's
front and back-end implementation.  However it would have to be a
templatized,
compile-time decision for the programmer in order to retain the binary-
compatibility with built-in integer types.
...
It would be interesting to know if your code could function as a "back
end"
to the proposed Boost.Multiprecision, using that library's expression
templates to eliminate temporaries.  There are two things that could be
learnt from that: does that help performance, and does the Multiprecision
front/backend split work?
I would have to study Boost.Multiprecision in detail for this.
But yes, interesting idea.

Thanks,

Mike

On 2 July 2012 19:43, Michael Tryhorn <mjtryhorn@gmail.com> wrote:
...
John, Chris, and anyone else interested,
Firstly, to answer a question asked by Marc Glisse, the 'l' prefix on all
l[u]int types defined by large_int/def.hpp is included to distinguish the
potentially large_int based types from those provided by cstdint or
equivalent.
So, these 'l' prefixed types could be imported into the current or global
namespace without interfering with any current or future built-in
equivalents.
I'm not a fan of 'using namespace x::y::z;', but I know some people do use
it.
Now onto testing.  I've tried the given example code upon my local test
machine and can corroborate your results.  I'm impressed - the performance
of
operator* at least is astounding (even for 1024bit integers).  I have tried
tweaking my classes here-and-there, and have even managed to halve the
time it
takes large_int::operator* to run (by adding special 'left_shift_by_one'
and
'right_shift_by_one' private operations).  I agree that what is needed is a
replacement of the algorihms rather than their tweaking, as the growth rate
against number-of-bits is rather high.
It seems that other operators like construction/destruction operator+ and
operator- are comparable on speed.  This is approximate, but I observed
large_int being quicker on simple operations like '+', '-' and so on at
128bit,
equivalent at 256bit and slower at 1024.  But, considering the overall
speed of
both Boost.LargeInt and Boost.Multiprecision in this respect, I doubt that
either would ever be seen as a bottleneck in any real application.
However, I do now have a couple of genuine points of difference between
Boost.Multiprecision and Boost.LargeInt as it is now:
* Object sizes - you seem to be adding a pointer to all integers
(presumably
  as a link between the front and back ends).  I can understand the need
for
  it, but this does increase the required memory to store arrays of them.
As
  boost::large_int::large_int classes are fully concrete and only ever
consist
  of their two internal parts, they have the same size as what would be
their
  built-in equivalent.  Meaning, sizeof( int128_t ) == sizeof( lint128_t ),
  unless the compiler decides to add class padding but I've never seen
padding
  added to a large_int object.
* This also leads onto forward compatibility and interoperability.  If we
were
  to serialize a large_int instance, it should have the same content as its
  equivalent built-in type.  This aids transparency of use, although it is
  currently only true upon little-endian machines.  Big-endian support
could
  be added by swapping the internal m_hi and m_lo large_int data members
and
  adding some 'ntohl'/'htonl' equivalent free-functions in the
boost::large_int
  namespace.
In conclusion, it is obvious that we are in agreement that support for
arbitary sized integers for C++ is required.  I still believe that
Boost.LargeInt is a good contender.  It is has both the above points and an
elegant design (even if I do say so myself), and would just need
improvements
to its integral division and multiplication.  Unfortunately, that is
beyond my
expertise, hence the current "simple but it works" solution.
If you have any suggestions on how this could be improved or merged into
existing work, I would be more than happy to collaborate.  Any further
questions or comments would of course be appreciated.
(Phil, I've seen your email and I will respond to it separately.)
Thanks,
Michael J. Tryhorn
P.S. I couldn't get the above testing code to run on either of my local
64bit machines, and had to resort to an old 32bit AMD machine which is
slowly
gathering dust.  This is after inserting Boost.Multiprecision into Boost
v1.50.0.
System:   Ubuntu 11.10 (64-bit)
          Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4
Compiler: gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
Command:  g++ -Wall -ggdb -O3 -march=native -I<snip>/include -L<snip>/lib \
 -o lint_tst lint_tst.cpp -Wl,-Bstatic -lboost_chrono -lboost_system \
 -Wl,-Bdynamic -lrt
Output:
32 bit time = 0.00177922 seconds
64 bit time = 0.00188884 seconds
lint_tst: <snip>/cpp_int.hpp:1090: void
boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits,
Signed, Allocator>&, const
boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
Allocator>&, const
boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true,
Allocator = void]: Assertion
`(std::numeric_limits<double_limb_type>::max)() - carry >
static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
Allocator>::max_limb_value) *
static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
Allocator>::max_limb_value)' failed.
System:   openSUSE 12.1 (64-bit)
          Intel® Xeon® CPU E5410 @ 2.33GHz × 4
Compiler: gcc (SUSE Linux) 4.6.2
Command:  [same as above]
Output:
32 bit time = 0.00183981 seconds
64 bit time = 0.00189349 seconds
lint_tst: <snip>/cpp_int.hpp:1090: void
boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits,
Signed, Allocator>&, const
boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
Allocator>&, const
boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true,
Allocator = void]: Assertion
`(std::numeric_limits<double_limb_type>::max)() - carry >
static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
Allocator>::max_limb_value) *
static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
Allocator>::max_limb_value)' failed.
(Same failure at other optimization levels, including O0, O1 and O2.)
This failure does not occur when running the same test on 32bit openSUSE
12.1,
and so may be integer-size related.  Here are the sizes of the built-in
types,
for reference:
openSUSE 32bit:
sizeof( short ) == 2
sizeof( int ) == 4
sizeof( long ) == 4
sizeof( long long ) == 8
sizeof( float ) == 4
sizeof( double ) == 8
sizeof( long double ) == 12
sizeof( mp_uint128_t ) == 20
sizeof( mp_uint256_t ) == 36
sizeof( mp_uint512_t ) == 68
sizeof( mp_uint1024_t ) == 132
openSUSE/Ubuntu 64bit:
sizeof( short ) == 2
sizeof( int ) == 4
sizeof( long ) == 8
sizeof( long long ) == 8
sizeof( float ) == 4
sizeof( double ) == 8
sizeof( long double ) == 16
sizeof( mp_uint128_t ) == 24
sizeof( mp_uint256_t ) == 40
sizeof( mp_uint512_t ) == 72
sizeof( mp_uint1024_t ) == 136
On 2 July 2012 18:38, Phil Endecott <spam_from_boost_dev@chezphil.org>wrote:
...
Hi Michael,
Michael Tryhorn wrote:
...
I have written a Boost-compatible library for the representation and
manipulation of integers of large or arbitrary size (including 128bit and
above).  I would like to make a preliminary submission of this library
for
your review and input, for potential inclusion into Boost v1.51.0 or
later,
as per the Boost Library Submission Process.
It would have been great if you could have joined the discussion a couple
of weeks ago.  Perhaps it is not too late for you to write a review of the
proposed Boost.Multiprecision, since you clearly have some experience of
the subject.
I have had a quick look at your code and my initial thoughts are:
- The storage of these types is just the actual numeric data, so I should
be able to do binary I/O on them etc., which is good.  I would perhaps be a
bit more confident of this if it were just an array of int32s or int64s,
rather than the current recursive organisation.
- You're using 2's complement, which I believe is the right choice.
- The simplicity of the code means it could be relatively bug-free, if
we're lucky.
- Bitwise multiplication and division is completely inappropriate.
- You have an idiom of working on a temporary and them copying the result
to *this to "commit" it, e.g.
template<class T, class U>
large_int<T, U>& large_int<T, U>::operator++ ()
{
    // Take a copy of *this, to modify
    this_type ret(*this);
// Perform the increment
    ++ret.m_lo;
    if( ret.m_lo == low_part_type(0) )
{
        ++ret.m_hi;
    }
// Commit the result
    return( *this = ret );
}
I don't see what the benefit of that is; it just makes ++ much slower
than it otherwise would be.
This example, operator++, also illustrates what may be a weakness of your
recursive style.  Consider a 256-bit int implemented using 64-bit built-in
ints via intermediate 128-bit ints.  The increment does this:
increment the 128-bit m_lo:
      increment the 64-bit m_lo (builtin)
      test the 64-bit m_lo for zero (builtin) (assume common-case, not
zero)
  test the 128-bit m_lo for zero (not built-in, impl in traits.hpp)
      test the 64-bit m_hi
      test the 64-bit m_lo
All of second half of that is redundant once you know that the least
significant 64 bits incremented to a non-zero value, but it is far from
obvious to me that the compiler would successfully optimise it away.  On
the other hand, if the implementation were just an array of 4 (u)int64_ts,
you would iterate with a for loop and break when non-zero.
- It would be great to see "hooks" where assembler implementations could
be inserted.
- It would be interesting to know if your code could function as a "back
end" to the proposed Boost.Multiprecision, using that library's expression
templates to eliminate temporaries.  There are two things that could be
learnt from that: does that help performance, and does the Multiprecision
front/backend split work?
Regards,  Phil.
______________________________**_________________
Unsubscribe & other changes: http://lists.boost.org/**
mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>