Re: [boost] Proposed patch for boost::counting_iterator

Sebastian Theophil wrote:
Hello Dave,
We really ought to be having this discussion in public, so let's please move to the Boost developers' list.
I took another look at the standard to make the argument clearer.
I don't follow. Let's try code.
counting_iterator<int> i0(INT_MIN), i1(INT_MAX); std::iterator_traits<counting_iterator<int> >::difference_type d = i1-i0;
Are you claiming that it's acceptable for d to be -1?
Yes, that was what I tried to say. Because for all practical purposes
i0 + d == i1 (Although the behavior is undefined it will work on most systems.)
Here's where you get into the difference between theoretical and actual portability.
The other property I tried to express was that comparing two differences for their absolute values is meaningless if d is allowed to be -1 because of overflow.
Ouch; your mailer doesn't wrap lines. What do you mean by "comparing two differences for their absolute value?"
My fundamental argument is that int is a modulo space which is exactly why 1) works if B-A is still int.
No, int is not a modulo space. Overflows induce undefined behavior. Only unsigned integers are a modulo space in C++.
OK, I believe you know the standard way better than I do, so I admit defeat in this point.
OK.
This is also why (T* - T*) = ptrdiff_t = int
No, ptrdiff_t is long on most platforms
and not int64.
And it may be a 64-bit int (especially on 64-bit platforms)
The last point is clear and I should have written that ptrdiff_t = word size or something like that. But back to what the standard says about ptrdiff_t in section 5.7 on Additive Operators
"When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the <cstddef> header (18.1). As with any other arithmetic overflow, if the result does not fit in the space provided, the behavior is undefined. In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i− j provided the value fits in an object of type std::ptrdiff_t."
Clearly, there's no strict requirement on ptrdiff_t except that it is a signed integral type. It clearly doesn’t have to be large enough to prevent overflows so it may have the same size as a processor word be that 4 bytes or 8. In practice that seems to be the preferred way although with a word length ptrdiff_t
Generally it has nothing to do with the (data) word length. It is usually an integral type whose size corresponds to the size of a pointer.
char* c0=(char*)0; char* c1=(char*)UIntToPtr(std::numeric_limits<size_t>::max()); ptrdiff_t d=c1-c0; char* c2=d+c0;
d overflows and in practice is -1 as well. Thus c2==c1.
Yes, but in practice no such array can exist, so it's not a problem. On the other hand, someone *can* reasonably make two counting_iterator<short>s and iterate from SHORT_MIN to SHORT_MAX. So in that case, representing the difference correctly could be important.
The same holds for
int n0=INT_MIN; int n1=INT_MAX; int dn=n1-n0; int n2=dn+n0;
dn is -1 and n2==n1.
Yes, that's because you specified int! The whole point of numeric_traits<int>::difference_type is to do better than that.
Of course, the overflow behavior is implementation dependent and it doesn't have to work that nicely. However, my argument is that this fact didn't cause anybody on the standards committee to change the definition for ptrdiff_t.
That's because it's up to QOI. In other words, an implementor interested in a high-quality implementation will ensure, if possible, that ptrdiff_t works for practical situations on his platform. That's exactly what we tried to do with counting_iterator. There's a difference between what the standard mandates and what an implementor should do. The standard also doesn't say that you have to support dynamic allocation of more than 10 bytes total.
I think counting_iterator<T> should behave exactly the same way when T is an integer than when it is an iterator.
It does, whenever possible. An iterator's difference type is usually a type that can be used to represent the difference between any two such iterators that belong to the same sequence. That's exactly what we try to do with counting_iterator<int>.
In both cases it can return iterator_traits<T>::difference_type. std::iterator_traits<int>::difference_type is specialized to be int.
I would be shocked if you could demonstrate that std::iterator_traits<int>::difference_type is specialized to be int in the standard, or indeed any implementation of C++. -- Dave Abrahams Boost Consulting http://boost-consulting.com

I took another look at the standard to make the argument clearer.
I don't follow. Let's try code.
counting_iterator<int> i0(INT_MIN), i1(INT_MAX); std::iterator_traits<counting_iterator<int> >::difference_type d = i1-i0;
Are you claiming that it's acceptable for d to be -1?
Yes, that was what I tried to say. Because for all practical purposes
i0 + d == i1 (Although the behavior is undefined it will work on most systems.)
Here's where you get into the difference between theoretical and actual portability.
It's portable if arithmetic is implemented using 2's complement if I'm not mistaken. But of course, this is not required to be the case.
The other property I tried to express was that comparing two differences for their absolute values is meaningless if d is allowed to be -1 because of overflow.
Ouch; your mailer doesn't wrap lines.
What do you mean by "comparing two differences for their absolute value?"
If differences are represented as ints then INT_MAX - (-INT_MIN) < INT_MAX - 0 If the difference is e.g. an int64 then as expected INT_MAX - (-INT_MIN) > INT_MAX - 0 If anyone relies on either behavior, changing the difference type may be a problem.
char* c0=(char*)0; char* c1=(char*)UIntToPtr(std::numeric_limits<size_t>::max()); ptrdiff_t d=c1-c0; char* c2=d+c0;
d overflows and in practice is -1 as well. Thus c2==c1.
Yes, but in practice no such array can exist, so it's not a problem. On the other hand, someone *can* reasonably make two counting_iterator<short>s and iterate from SHORT_MIN to SHORT_MAX. So in that case, representing the difference correctly could be important.
My argument was precisely that on the architectures I'm aware of you can do the latter even when difference_type is short because shorts behave like a modulo space although they aren't required to do so. On the other hand I can iterate over all memory addresses I have although I couldn't allocate as much memory for myself. Representing the difference between two pointers would be just as important.
The same holds for
int n0=INT_MIN; int n1=INT_MAX; int dn=n1-n0; int n2=dn+n0;
dn is -1 and n2==n1.
Yes, that's because you specified int! The whole point of numeric_traits<int>::difference_type is to do better than that.
My point is actually that INT_MAX - (-INT_MIN) is defined to be int because all operands are ints and in this case I think the standard is with me. Please correct me if there's a subtlety I missed. I would need to upcast explicitly which is what the counting_iterator does.
Of course, the overflow behavior is implementation dependent and it
doesn't have to work that nicely. However, my argument is that this fact didn't cause anybody on the standards committee to change the definition for ptrdiff_t.
That's because it's up to QOI. In other words, an implementor interested in a high-quality implementation will ensure, if possible, that ptrdiff_t works for practical situations on his platform. That's exactly what we tried to do with counting_iterator.
So your argument is ptrdiff_t shouldn't be int in the MSVC library implementation because that's a bad choice on a 32 bit system?
In both cases it can return iterator_traits<T>::difference_type. std::iterator_traits<int>::difference_type is specialized to be int.
I would be shocked if you could demonstrate that std::iterator_traits<int>::difference_type is specialized to be int in the standard, or indeed any implementation of C++.
I don't find any specialization for iterator_traits<int> in the standard but of course I checked my STL headers and found this directly in <xutility> template<> struct iterator_traits<short> { // get traits from integer type typedef _Int_iterator_tag iterator_category; typedef short value_type; typedef short difference_type; typedef short distance_type; typedef short * pointer; typedef short& reference; }; template<> struct iterator_traits<unsigned short> { // get traits from integer type typedef _Int_iterator_tag iterator_category; typedef unsigned short value_type; typedef unsigned short difference_type; typedef unsigned short distance_type; typedef unsigned short * pointer; typedef unsigned short& reference; }; template<> struct iterator_traits<int> { // get traits from integer type typedef _Int_iterator_tag iterator_category; typedef int value_type; typedef int difference_type; typedef int distance_type; typedef int * pointer; typedef int& reference; }; /* * Copyright (c) 1992-2005 by P.J. Plauger. ALL RIGHTS RESERVED. * Consult your license regarding permissions and restrictions. */ /* * This file is derived from software bearing the following * restrictions: * * Copyright (c) 1994 * Hewlett-Packard Company * ... */ So that's HP and P.J. Plauger who agree with me :-) BTW, did I read right that the boost numeric_traits difference_type definition for unsigned int would be intmax_t which again is int64 although the standard says unsigned integers are a modulo space? I know my arguments rely on "observed behavior", i.e. on the fact that integers behave like a modulo space on Intel processors which MSVC is kind of limited to. The only other argument I can make is that counting_iterator should in my view be an iterator interface to the integral types with their well-known limitations. I can iterate from SHORT_MIN to SHORT_MAX *and* I can compute the differences just as well as I can with shorts themselves. If my implementation handles overflows fine, I don't have any problems. If it doesn't, why should counting_iterator try to be "betterer" than ptrdiff_t? Regards Sebastian -- Sebastian Theophil · stheophil@think-cell.com Software Engineer think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · fax +49-30-666473-19 Geschäftsführer: Dr. Markus Hannebauer, Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

AMDG Sebastian Theophil wrote:
If differences are represented as ints then
INT_MAX - (-INT_MIN) < INT_MAX - 0
If the difference is e.g. an int64 then as expected
INT_MAX - (-INT_MIN) > INT_MAX - 0
If anyone relies on either behavior, changing the difference type may be a problem.
Well, no one *should* rely on either behavior. It's undefined.
My argument was precisely that on the architectures I'm aware of you can do the latter even when difference_type is short because shorts behave like a modulo space although they aren't required to do so. On the other hand I can iterate over all memory addresses I have although I couldn't allocate as much memory for myself. Representing the difference between two pointers would be just as important.
There are no guarantees about what pointer arithmetic does outside of a contiguous block of memory.
So your argument is ptrdiff_t shouldn't be int in the MSVC library implementation because that's a bad choice on a 32 bit system?
int is okay. If I try to create an array with a size larger than an int can hold the msvc 9.0 says error C2148: total size of array must not exceed 0x7fffffff bytes
I know my arguments rely on "observed behavior", i.e. on the fact that integers behave like a modulo space on Intel processors which MSVC is kind of limited to. The only other argument I can make is that counting_iterator should in my view be an iterator interface to the integral types with their well-known limitations.
If you want the behavior of integers, dereference the iterators and work with integers.
I can iterate from SHORT_MIN to SHORT_MAX *and* I can compute the differences just as well as I can with shorts themselves.
Incidentally, short - short -> int
If my implementation handles overflows fine, I don't have any problems. If it doesn't, why should counting_iterator try to be "betterer" than ptrdiff_t?
I don't think that it is better in practice. In Christ, Steven Watanabe

on Thu Mar 20 2008, "Sebastian Theophil" <stheophil-AT-think-cell.com> wrote:
char* c0=(char*)0; char* c1=(char*)UIntToPtr(std::numeric_limits<size_t>::max()); ptrdiff_t d=c1-c0; char* c2=d+c0;
d overflows and in practice is -1 as well. Thus c2==c1.
Yes, but in practice no such array can exist, so it's not a problem. On the other hand, someone *can* reasonably make two counting_iterator<short>s and iterate from SHORT_MIN to SHORT_MAX. So in that case, representing the difference correctly could be important.
My argument was precisely that on the architectures I'm aware of you can do the latter even when difference_type is short because shorts behave like a modulo space although they aren't required to do so.
Yes, you can do the iteration, regardless of the difference_type. However, you can't accurately measure the distance between two such iterators unless the difference_type can represent numbers outside the range of SHRT_MIN..SHRT_MAX. On the other hand, with a difference_type of ptrdiff_t you *can* accurately measure the distance between any two pointers that can be legally subtracted or compared in C++.
On the other hand I can iterate over all memory addresses I have although I couldn't allocate as much memory for myself.
Not legally in C++. Iterating off the end of an array of objects is undefined behavior.
Representing the difference between two pointers would be just as important.
The same holds for
int n0=INT_MIN; int n1=INT_MAX; int dn=n1-n0; int n2=dn+n0;
dn is -1 and n2==n1.
Yes, that's because you specified int! The whole point of numeric_traits<int>::difference_type is to do better than that.
My point is actually that INT_MAX - (-INT_MIN) is defined to be int
No, it's not defined to be anything. In fact, it's undefined behavior because, if I remember the standard language correctly, "the result is outside the representable range of the result type," i.e. an overflow or underflow.
because all operands are ints
It's that way because C does it that way.
and in this case I think the standard is with me. Please correct me if there's a subtlety I missed.
You missed the fact that it's undefined behavior to even mention -INT_MIN on a 2s compliment machine. Try throwing this program at the comeau online compiler: #include <climits> unsigned x = -INT_MIN; It's being extra nice and warning you instead of launching a missile or overwriting your hard drive.
Of course, the overflow behavior is implementation dependent and it
doesn't have to work that nicely. However, my argument is that this fact didn't cause anybody on the standards committee to change the definition for ptrdiff_t.
That's because it's up to QOI. In other words, an implementor interested in a high-quality implementation will ensure, if possible, that ptrdiff_t works for practical situations on his platform. That's exactly what we tried to do with counting_iterator.
So your argument is ptrdiff_t shouldn't be int in the MSVC library implementation because that's a bad choice on a 32 bit system?
No, I'm not. ptrdiff_t is a perfectly good choice for representing any distance between pointers that can be measured in C++ without invoking undefined behavior, as long as the system's dynamic allocator won't give you a block of memory whose size is outside the representable range of ptrdiff_t and as long as the same goes for the implementation limits for declaring arrays in the compiler.
In both cases it can return iterator_traits<T>::difference_type. std::iterator_traits<int>::difference_type is specialized to be int.
I would be shocked if you could demonstrate that std::iterator_traits<int>::difference_type is specialized to be int in the standard, or indeed any implementation of C++.
I don't find any specialization for iterator_traits<int> in the standard but of course I checked my STL headers and found this directly in <xutility>
template<> struct iterator_traits<short> { // get traits from integer type typedef _Int_iterator_tag iterator_category; typedef short value_type; typedef short difference_type; typedef short distance_type; typedef short * pointer; typedef short& reference; };
template<> struct iterator_traits<unsigned short> { // get traits from integer type typedef _Int_iterator_tag iterator_category; typedef unsigned short value_type; typedef unsigned short difference_type; typedef unsigned short distance_type; typedef unsigned short * pointer; typedef unsigned short& reference; };
template<> struct iterator_traits<int> { // get traits from integer type typedef _Int_iterator_tag iterator_category; typedef int value_type; typedef int difference_type; typedef int distance_type; typedef int * pointer; typedef int& reference; };
Wow, I'm impressed. Not sure why they'd do that, but there you go.
/* * Copyright (c) 1992-2005 by P.J. Plauger. ALL RIGHTS RESERVED. * Consult your license regarding permissions and restrictions. */ /* * This file is derived from software bearing the following * restrictions: * * Copyright (c) 1994 * Hewlett-Packard Company * ... */
So that's HP and P.J. Plauger who agree with me :-)
No, it's just Bill Plaugher, unless you can show that the code was present in the HP implementation.
BTW, did I read right that the boost numeric_traits difference_type definition for unsigned int would be intmax_t which again is int64 although the standard says unsigned integers are a modulo space?
Probably. The intended behavior of numeric_traits' difference_type is that it's a type that's best able to represent the integer difference between any two numbers of a given type. It's not supposed to account for modulo arithmetic. If you want that behavior for unsigned int, it's trivial to get it, so why would we need numeric_traits for that?
I know my arguments rely on "observed behavior", i.e. on the fact that integers behave like a modulo space on Intel processors which MSVC is kind of limited to.
What is the modulus for int? How do you account for the fact that INT_MIN != -INT_MAX?
The only other argument I can make is that counting_iterator should in my view be an iterator interface to the integral types with their well-known limitations.
Why?
I can iterate from SHORT_MIN to SHORT_MAX *and* I can compute the differences just as well as I can with shorts themselves.
Not so, as Steven pointed out, because of integral promotion.
If my implementation handles overflows fine, I don't have any problems.
Seriously? Suppose you adapt a counting_iterator with a transform iterator that applies a monotonically increasing function. Now binary search in a range of these adapted iterators for the place where the function's result crosses zero. What do you think will happen if, when measuring the distance from the beginning to the end of a valid range, binary_search sees a negative number?
If it doesn't, why should counting_iterator try to be "betterer" than ptrdiff_t?
Why not try to do as well as we can? -- Dave Abrahams Boost Consulting http://boost-consulting.com
participants (3)
-
David Abrahams
-
Sebastian Theophil
-
Steven Watanabe