Re: [boost] Algebraic abstractions related to (big) integers in C++

On 3/27/11 00:21 Thomas Klimpel wrote:
The goal of this message is to reflect on the two's complement representation of negative integers, from a relatively abstract algebraic perspective.
I suspect that practical arguments may resonate more deeply with this list than theoretical ones. Indeed, from a theoretical point of view it's trivial to convert operations on a sign bit + N-1 magnitude bit representation to/from operations on the N-bit 2's complement representation, and in fact there is one value representable in the two's complement representation that is not in the sign/magnitude representation of the same length, which seems like a (very minor) theoretical win for 2's complement. If the operands are the same size (even if that spans multiple machine words), I think 2's complement is clearly preferable to take advantage of the hardware instruction set. With the exception of the overflow flag, addition, subtraction, and the low-order result of a multiplication yield the same bitstring whether the operands (and result) are taken to be 2's complement or unsigned, which means that you can perform arithmetic on lower-order word-size chunks of a large integer without caring whether the highest-order chunk is positive or negative. Might as well take advantage of the instruction-level building blocks for 2's complement representation (with appropriate overflow, carry-in, etc. options) in this case. If you might be adding or subtracting large integers of different length, the sign-magnitude representation is arguably simpler than sign-extending the smaller operand (and potentially propagating carry-outs up to the highest-order chunk of the result.) This then lets you choose to represent negative numbers in the smallest possible number of words without worrying about an arithmetic performance tradeoff. On the other hand, multiword sign-magnitude arithmetic requires two code paths for every operation, with an initial test-and-branch to determine whether all your ADDs should be replaced by SUBs. And given that 2's complement seems preferable for fixed-width arithmetic, I think I'd want a library supporting both fixed- and variable-width representations to adopt the same convention for both.
"What I consider especially strange from an algebraic point of view, is to automatically promote a signed integer type to its corresponding unsigned integer type, when >an arithmetic expression contains both signed and unsigned integer types. Shouldn't it be the other way round? Or wouldn't it be even better, if there would be no >automatic conversion at all? At least from an algebraic point of view, any natural number is also an integer, not vice versa like in C."
To be precise, this happens only when the rank of the unsigned type is greater than or equal to that of the signed type. In the case where both operands have non-negative values, this minimizes the risk of overflow during addition or multiplication. When performing an operation on a unsigned value and a signed value (or subtracting a unsigned value from a signed positive value of lesser magnitude), it is incumbent upon the programmer to have some notion of what he or she is doing. I believe the rationale may be that signed values are the default, so if you went out of your way to declare something as unsigned then you're accepting some responsibility for using it appropriately. Note also that (at least in ANSI C) when performing arithmetic between an unsigned value and a signed value of higher rank, the promotion is to the signed type--- so the "promote to the 'larger' type" principle is consistently applied across the board, particularly if you consider unsigned types "larger" than the signed equivalent (seems reasonable to me given how uncommon negative values are in most integer code.)

On Sun, Mar 27, 2011 at 9:41 AM, Brent Spillner <spillner@acm.org> wrote:
"What I consider especially strange from an algebraic point of view, is to automatically promote a signed integer type to its corresponding unsigned integer type, when >an arithmetic expression contains both signed and unsigned integer types. Shouldn't it be the other way round? Or wouldn't it be even better, if there would be no >automatic conversion at all? At least from an algebraic point of view, any natural number is also an integer, not vice versa like in C."
To be precise, this happens only when the rank of the unsigned type is greater than or equal to that of the signed type. In the case where both operands have non-negative values, this minimizes the risk of overflow during addition or multiplication. When performing an operation on a unsigned value and a signed value (or subtracting a unsigned value from a signed positive value of lesser magnitude), it is incumbent upon the programmer to have some notion of what he or she is doing. I believe the rationale may be that signed values are the default, so if you went out of your way to declare something as unsigned then you're accepting some responsibility for using it appropriately. Note also that (at least in ANSI C) when performing arithmetic between an unsigned value and a signed value of higher rank, the promotion is to the signed type--- so the "promote to the 'larger' type" principle is consistently applied across the board, particularly if you consider unsigned types "larger" than the signed equivalent (seems reasonable to me given how uncommon negative values are in most integer code.)
Even if that's consistent, it's mathematically wrong to convert a signed type to unsigned unless you know the value is non-negative. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Brent Spillner wrote:
On 3/27/11 00:21 Thomas Klimpel wrote:
The goal of this message is to reflect on the two's complement representation of negative integers, from a relatively abstract algebraic perspective.
I suspect that practical arguments may resonate more deeply with this list than theoretical ones.
Well, thanks for providing some practical arguments.
Indeed, from a theoretical point of view it's trivial to convert operations on a sign bit + N-1 magnitude bit representation to/from operations on the N-bit 2's complement representation, and in fact there is one value representable in the two's complement representation that is not in the sign/magnitude representation of the same length, which seems like a (very minor) theoretical win for 2's complement.
I found out that this is actually a "theoretical" loss. The following program int i = 0x80000000; std::cout << i << std::endl; std::cout << std::abs(i) << std::endl; prints -2147483648 -2147483648 So there exists an int value for which the assertion "assert(std::abs(i) >= 0);" fails. But when you convert the result of std::abs to unsigned, you get at least the correct value.
If the operands are the same size (even if that spans multiple machine words), I think 2's complement is clearly preferable to take advantage of the hardware instruction set. With the exception of the overflow flag, addition, subtraction, and the low-order result of a multiplication yield the same bitstring whether the operands (and result) are taken to be 2's complement or unsigned, which means that you can perform arithmetic on lower-order word-size chunks of a large integer without caring whether the highest-order chunk is positive or negative. Might as well take advantage of the instruction-level building blocks for 2's complement representation (with appropriate overflow, carry-in, etc. options) in this case.
So there are "practical arguments" for using two's complement representation for fixed-width integers. Very good.
If you might be adding or subtracting large integers of different length, the sign-magnitude representation is arguably simpler than sign-extending the smaller operand (and potentially propagating carry-outs up to the highest-order chunk of the result.) This then lets you choose to represent negative numbers in the smallest possible number of words without worrying about an arithmetic performance tradeoff. On the other hand, multiword sign-magnitude arithmetic requires two code paths for every operation, with an initial test-and-branch to determine whether all your ADDs should be replaced by SUBs.
So even by "practical arguments", it's not easy to settle the question whether two's complement representation should be used for variable width integers.
And given that 2's complement seems preferable for fixed-width arithmetic, I think I'd want a library supporting both fixed- and variable-width representations to adopt the same convention for both.
That was also my initial thought. But then I understood that floating point numbers should not use the two's complement representation. So all algorithms must be implemented for unsigned integers anyway. But then it doesn't make such a big difference how we implement the algorithms for the signed integers in terms of these algorithms for the unsigned integers.
"What I consider especially strange from an algebraic point of view, is to automatically promote a signed integer type to its corresponding unsigned integer type, when >an arithmetic expression contains both signed and unsigned integer types. Shouldn't it be the other way round? Or wouldn't it be even better, if there would be no >automatic conversion at all? At least from an algebraic point of view, any natural number is also an integer, not vice versa like in C."
To be precise, this happens only when the rank of the unsigned type is greater than or equal to that of the signed type. In the case where both operands have non-negative values, this minimizes the risk of overflow during addition or multiplication. When performing an operation on a unsigned value and a signed value (or subtracting a unsigned value from a signed positive value of lesser magnitude), it is incumbent upon the programmer to have some notion of what he or she is doing. I believe the rationale may be that signed values are the default, so if you went out of your way to declare something as unsigned then you're accepting some responsibility for using it appropriately.
Well, the built-in operator sizeof() returns an unsigned integer value, as do many other functions in C. In C++, c.size() is unsigned, as are many other values. So I don't really have to go out of my way to end up with a mix of signed and unsigned numbers. I don't think that the reason behind the "signed to unsigned promotion" rule is a "rationale". The problem is that "1", "23" have the concrete type "int", which forces the "signed to unsigned promotion" upon us. This is because the type system of C isn't powerful enough to handle this situation more appropriately. One idea how it would have been possible to handle this problem in C would be to introduce a special type for "integer literals", or even "integer constants" that automatically take care of the required conversions.
Note also that (at least in ANSI C) when performing arithmetic between an unsigned value and a signed value of higher rank, the promotion is to the signed type--- so the "promote to the 'larger' type" principle is consistently applied across the board, particularly if you consider unsigned types "larger" than the signed equivalent (seems reasonable to me given how uncommon negative values are in most integer code.)
You're kidding, right? Regards, Thomas

Thomas Klimpel wrote:
Brent Spillner wrote:
Note also that (at least in ANSI C) when performing arithmetic between an unsigned value and a signed value of higher rank, the promotion is to the signed type--- so the "promote to the 'larger' type" principle is consistently applied across the board, particularly if you consider unsigned types "larger" than the signed equivalent (seems reasonable to me given how uncommon negative values are in most integer code.)
You're kidding, right?
I've heard this rationale before. He is serious. Promoting 32 bit unsigned to 32 bit signed, for example, could overflow, so is clearly not safe. Promoting signed to unsigned is just as bad, of course, but it won't overflow. I'd like a compiler warning for all auto promotion of signed to unsigned so that I can inspect the line and see if I think it is safe. If it happens when you didn't realize it would it is usually unsafe. If it happens when you realize it would you might as well make it explicit since you gave it enough thought to know it was safe. We have our promotion rules inherited from C, so we can't fix them any more than we can fix the linker. The rules don't specify what a warning is, though, and warnings are errors for us pedantic people. To some extent we get to make our own rules. As far as I'm concerned implicit cast from signed to unsigned is illegal. Regards, Luke
participants (4)
-
Brent Spillner
-
Dave Abrahams
-
Simonson, Lucanus J
-
Thomas Klimpel