
Dear all, Some responses to questions asked so far:
That can't be right. Addition of two n-bit numbers requires processing all n bits and is therefore O(n). The fastest known algorithm for multiplying two n-bit numbers is slower than O(n).
I may be wrong, but my understanding was that at least for an addition operation, while certainly implemented via adders in hardware (and hence O(n)), we would not call it so in algorithms that use it. Let me give an example. If I wrote a function thus: int add_these(int a, int b) { return a+b; } I would expect that rather than converting my '+' to a loop, the compiler would turn it into assembly 'LOAD/ADD/STORE'. My evaluation of the complexity of the above algorithm itself (if we can call it such) would be to label it 'O(1)' rather than 'O(n)'. For reference, let me quote the code from large_int::operator+= : large_int<T, U>& large_int<T, U>::operator+= (const this_type& val) { // Take a copy of *this, to modify this_type ret(*this); // Perform the addition ret.m_lo += val.m_lo; ret.m_hi += val.m_hi; if( ret.m_lo < m_lo ) { ++ret.m_hi; } // Commit the result return( *this = ret ); } Although a little more complex, additions should only ever perform a single pass of this function. As for multiplication, the operator*= implementation uses a for loop which loops at most 'n' times, where 'n' is the number of binary bits in the integer, making use of += (above) and the shift operators.
Could the programmer be given access (through member functions) to the two members of a large_int?
Absolutely. In fact, I used to allow this and removed the functionality to make large_int more 'integer like'. I wouldn't add this as member functions (as this would be inconsistent with the built-in integer types). I would, however, either place this functionality into large_int_traits as accessors, in the boost::large_int namespace as free-functions, or both.
The typedef for l[u]int128_t does not seem to use the builtin corresponding type when available, could this be done?
Again, yes, absolutely. Provided these types support all the required operations as well as std::numeric_limits, this could be done via preprocessor macros to detect the necessary compiler support (as it already done where 64bit is unavailable).
I had "undefined symbol" problems when using the builtin 128 bits integers (provided by gcc) in the signature of functions in a shared library (it was a module created with Boost.Python, the problem occurred when importing this module within python). The only solution I found was to wrap integer-like types inside a (templated) class, which looks like a particular case of your library (this would be "large_int<EMPTY_TYPE, U>"). Would LargeInt be an appropriate place to solve this problem?
I can't speak from experience on this in particular, however where I had a need to make an existing integral type "slightly larger", I did it with this little trick: typedef boost::large_int::large_int<unsigned int, T> my_int_type; Where 'T' is the type you'd be wrapping. This extends 'T' by combining it with an unsigned low_part_type.
Why use Boost.Operator for some operators but not all (e.g. comparison operators)?
It was to do with integral promotion, and I have to admit here that I made a mistake in my earlier email where I stated that all arithmetic operators were fine with mixing their lhs and rhs operator sizes. This is not the case for arithmetic operators, but it is for comparison ('<', '<=', '==', and so on). Where two large_int values which differ in size are compared, the smaller of the two must be promoted to the same type as the larger so the comparison may take place without losing integer information. If I were to use Boost.Operator to define these operators automatically, I don't think such promotion would be possible.
I have not found through the doc what is the result of luint128_t +luint128_t luint128_t +luint256_t Could you point me where I can find these kind of information?
If you are referring to the arithmetic operator '+', luint128_t +luint128_t would produce a new luint128_t value. Just like their 64bit equivalent, the result can overflow. As for luint128_t + luint256_t, this is where I follow on from the above answer and it gets a little tricky. As the 128bit integer is the left-hand-side, I believe that boost::addable<T> will return the left-hand-side type of the expression. It might be possible to overcome this by *not* inherting from boost::integer_arithmetic and defining all operators manually with some MPL-like type selection. This might be good for a version 1.1! For the moment, if you *know* that a 256bit result is needed, using operator+= with a 256bit large_int left-hand-side will most certainly produce a 256bit result. Thanks for the input - I'll add notes on this to the documentation. (Regarding "Fixed at compile time values")
Please note the "for example" part. Fixed precision integers are fixed to the number of bits in Boost.Multiprecision the same as yours are.
If you mean that the optional selection of an integer type by the number of decimal digits required rather than the underlying type, then fair enough. Is that what you mean? Boost.LargeInt types are always created by a number of binary digits at the moment. Not being too familiar with Boost.Multiprecision, could I please ask if it is possible to select an integral type from a number of binary bits, and is there any restriction on power-of-2 vs. non-power-of-2?
"A header-only Boost license version is always available (if somewhat slower)." So no difference then? ;-)
I may have misread your documentation, but I thought you said that some functionality was dependent upon third party restricted license libraries, but that 'a header-only Boost license version is always available (if somewhat slower)'. I took this to mean that Boost.Multiprecision's optimal performance was only available where the restricted licencing for those additonal libraries was acceptable to the end user. Please correct me if I misunderstood this point. Do you mean slower to compile? Or, slower to run? Boost.LargeInt is actually pretty light-weight on headers; it short in code, doesn't include many headers itself and even leaves out I/O the headers where not needed.
You can turn expression templates off in the multiprecision lib, in which case it behaves as yours does - in fact the expression templates are turned off for the fixed precision integer part, as it adds nothing there.
Fair enough.
Congratulations, the Fields medal is clearly on it's way! Seriously, you might want to think again on those complexities...
No need to be rude. My reasoning for the O(x) complexities is explained above. (Regarding comparison against literal values)
Likewise the multiprecision lib (in fact promotion to the extended type first isn't strictly necessary - you can do things more efficiently than that).
While I think there is a place for a fixed size integer library in Boost
How could this be done more efficiently? Comparison against a literal in Boost.LargeInt uses the large_int constructor, which is again O(1) (subject to my understanding of complexity calculations :-P). To convert a 'long' to a 'large_int' for these comparisons, it uses the above 'make it larger' type trick to create a temporary - which is itself composed of just two integers, allocated from the stack or registers as the compiler sees fit. that's
"as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison.
I agree that some benchmarking of Boost.LargeInt against the header-only equivalent of Boost.Multiprecision is certainly required, and I will take a look at the results you have supplied. I am glad you think, as I do, that there is a place for a fixed size integer library in Boost, and I am keen that it is as useful to developers as possible. Thanks everyone for all the feedback so far, Michael J. Tryhorn On 30 June 2012 19:06, John Maddock <boost.regex@virgin.net> wrote:
Michael,
While I think there is a place for a fixed size integer library in Boost that's "as simple as possible" (on the grounds that for smaller fixed sizes, simple is often faster than more complex schemes, even if the complex schemes have better big-O performance), this method of implementation does appear to be quite slow, though I confess I've only done a cursory comparison.
So for comparison, I measured how long it took to compute 100 dot products of 1000-element vectors. The vectors were filled with values that "used" a specified number of bits (to simulate the typical use case were most values are quite small, and just a few actually require an extended precision type), here's the results compiled on Win32 with VC10 and /Ox:
Multiprecision mp_int128: 32 bit time = 0.0100556 seconds 64 bit time = 0.02594 seconds 96 bit time = 0.0229292 seconds 128 bit time = 0.0142151 seconds Multiprecision mp_int256: 32 bit time = 0.00376926 seconds 64 bit time = 0.0102474 seconds 96 bit time = 0.01383 seconds 128 bit time = 0.0192363 seconds
large_int<unsigned long long, long long>: 32 bit time = 0.0498975 seconds 64 bit time = 0.111413 seconds 96 bit time = 0.183727 seconds 128 bit time = 0.251469 seconds large_int<large_int<unsigned long long, unsigned long long>, large_int<unsigned long long, long long> >: 32 bit time = 1.08451 seconds 64 bit time = 2.43866 seconds 96 bit time = 3.79658 seconds 128 bit time = 5.20211 seconds
Note how the times taken for mp_intXXX grow largely based on bits actually used, not bits specified in the type. But also, even the case where I expected large_int to do well - in fact where it really should win in this kind of comparison - for part filled 128-bit int's, it's still over 4 times slower than the multiprecision code :-(
My guess is this is a consequence of the implementation method (as a doubled-type rather than an array of limbs)? If so that's actually quite useful information to know.
Regards, John
PS I see you're using bit by bit multiplication - easy to understand and code, but it's glacially slow in general, I tried this for integer division early on and it ran like treacle (on a very cold day)!
PPS code follows:
#include <boost/chrono.hpp> #include <boost/multiprecision/cpp_int.**hpp> #include <boost/large_int.hpp>
template <class Clock> struct stopwatch { typedef typename Clock::duration duration; stopwatch() { m_start = Clock::now(); } duration elapsed() { return Clock::now() - m_start; } void reset() { m_start = Clock::now(); }
private: typename Clock::time_point m_start; };
template <class T> void time() { std::vector<T> v1(1000, 0x12345FF), v2(1000, 0x12345F9), v3(1000, 0); boost::chrono::duration<**double> time; stopwatch<boost::chrono::high_**resolution_clock> c; for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "32 bit time = " << time << std::endl;
for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "64 bit time = " << time << std::endl;
for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "96 bit time = " << time << std::endl;
for(unsigned i = 0; i < v1.size(); ++i) { v1[i] <<= 32; v1[i] += 0x12345FF; v2[i] <<= 32; v2[i] += 0x12345F9; } c.reset(); for(unsigned i = 0; i < 100; ++i) { for(unsigned i = 0; i < v1.size(); ++i) { v3[i] = v1[i] * v2[i]; } } time = c.elapsed(); std::cout << "128 bit time = " << time << std::endl;
}
int _tmain(int argc, _TCHAR* argv[]) { using namespace boost::multiprecision;
time<mp_int128_t>(); time<mp_int256_t>();
typedef boost::large_int::large_int<**unsigned long long, long long> t128; typedef boost::large_int::large_int<**unsigned long long, unsigned long long> ut128; typedef boost::large_int::large_int<**ut128, t128> t256;
time<t128>(); time<t256>();
return 0;
}
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>