Iterator Range and operator==

Hi, What do you expect this code to do? Is b true or false? And why? Is this expected behaviour? #include <boost/range/iterator_range.hpp> #include <string> int main() { std::string s = "Olaf"; boost::iterator_range<std::string::iterator> r(s); bool a = r == s; bool b = r == "Olaf"; assert(a); assert(b); return 0; } -- Olaf

Hi,
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
#include <boost/range/iterator_range.hpp> #include <string>
int main() { std::string s = "Olaf"; boost::iterator_range<std::string::iterator> r(s); bool a = r == s; bool b = r == "Olaf"; assert(a); assert(b); return 0; }
-- Olaf
I would expect both a and b to false since one side is an iterator range, and the other a string. However, I would also expect s == "Olaf"; to be false because one is std::string and the other a string constant. It all depends on how you define equality. Do both sides need to be the same type, do they only have to have same content, does the content need to be at the same memory location? It all depends on the interpretation of equality.

On Friday 20 April 2012 14:15:50 Olaf van der Spek wrote:
Hi,
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
#include <boost/range/iterator_range.hpp> #include <string>
int main() { std::string s = "Olaf"; boost::iterator_range<std::string::iterator> r(s); bool a = r == s; bool b = r == "Olaf"; assert(a); assert(b); return 0; }
I would prefer it to not compile. It is not clear what b should be initialized with because the "Olaf" argument can be interpreted as either an array of 5 characters or a string of 4 characters. If you wrap "Olaf" with as_literal, b should be initialized with true (because ranges are compired element-wise).

On 20/04/12 14:15, Olaf van der Spek wrote:
Hi,
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
#include<boost/range/iterator_range.hpp> #include<string>
int main() { std::string s = "Olaf"; boost::iterator_range<std::string::iterator> r(s); bool a = r == s;
a is true.
bool b = r == "Olaf";
b is false, "Olaf" is one character longer than r.

On Fri, Apr 20, 2012 at 3:19 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 20/04/12 14:15, Olaf van der Spek wrote:
Hi,
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
#include<boost/range/iterator_range.hpp> #include<string>
int main() { std::string s = "Olaf"; boost::iterator_range<std::string::iterator> r(s); bool a = r == s;
a is true.
bool b = r == "Olaf";
b is false, "Olaf" is one character longer than r.
It is current behaviour. But is it expected behaviour? -- Olaf

on Sat Apr 21 2012, Olaf van der Spek <ml-AT-vdspek.org> wrote:
On Fri, Apr 20, 2012 at 3:19 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 20/04/12 14:15, Olaf van der Spek wrote:
Hi,
What do you expect this code to do? Is b true or false? And why?
Is this expected behaviour?
#include<boost/range/iterator_range.hpp> #include<string>
int main() { std::string s = "Olaf"; boost::iterator_range<std::string::iterator> r(s); bool a = r == s;
a is true.
bool b = r == "Olaf";
b is false, "Olaf" is one character longer than r.
It is current behaviour. But is it expected behaviour?
Yes. Until the language gives us a way to distinguish string literals from other arbitrary arrays of char, it's the best we can do. The alternative is to have generic code suddenly stop working when T==char. Regards, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Sat, Apr 21, 2012 at 11:33 AM, Dave Abrahams <dave@boostpro.com> wrote:
It is current behaviour. But is it expected behaviour?
Yes.
Based on what? Knowledge of this issue?
Until the language gives us a way to distinguish string literals from other arbitrary arrays of char, it's the best we can do. The alternative is to have generic code suddenly stop working when T==char.
What about requiring the use of as_array or as_literal in all cases? -- Olaf

on Sat Apr 21 2012, Olaf van der Spek <ml-AT-vdspek.org> wrote:
On Sat, Apr 21, 2012 at 11:33 AM, Dave Abrahams <dave@boostpro.com> wrote:
It is current behaviour. But is it expected behaviour?
Yes.
Based on what? Knowledge of this issue?
Yes. The range library originally treated char* as null-terminated, and there was an outcry.
Until the language gives us a way to distinguish string literals from other arbitrary arrays of char, it's the best we can do. The alternative is to have generic code suddenly stop working when T==char.
What about requiring the use of as_array or as_literal in all cases?
You mean, so that raw built-in arrays would cease to be ranges at all? I guess that's a possiblity. Nobody thought about that, IIRC. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Sat, Apr 21, 2012 at 4:07 PM, Dave Abrahams <dave@boostpro.com> wrote:
on Sat Apr 21 2012, Olaf van der Spek <ml-AT-vdspek.org> wrote:
On Sat, Apr 21, 2012 at 11:33 AM, Dave Abrahams <dave@boostpro.com> wrote:
It is current behaviour. But is it expected behaviour?
Yes.
Based on what? Knowledge of this issue?
Yes. The range library originally treated char* as null-terminated, and there was an outcry.
I know, but what do you think the average C++ dev is expecting? IMO it's understandable (once you know about the issue), but not expected.
Until the language gives us a way to distinguish string literals from other arbitrary arrays of char, it's the best we can do. The alternative is to have generic code suddenly stop working when T==char.
What about requiring the use of as_array or as_literal in all cases?
You mean, so that raw built-in arrays would cease to be ranges at all? I guess that's a possiblity. Nobody thought about that, IIRC.
Yes. If necessary just for char[] to minimize breakage. -- Olaf

I know, but what do you think the average C++ dev is expecting? IMO it's understandable (once you know about the issue), but not expected.
If there were a solution whereby the choice of the array handling where zero-termination should be handled as the end were readily available without causing overhead for the existing specified supported cases we would remove the surprise. However, since the Range documentation clearly specifies that arrays are directly supported and literal strings are not I do not feel that one should be surprised. The use-case that was provided directly explicitly constructed a range from a char[]. No implicit conversion occurred. The use-case presented therefore explicitly constructed a range from a char[]. The length of the sequence was the length of the array. This is consistent with the behaviour for all arrays. One of the main reasons that the char* with zero-termination was rejected was because of the O(N) string length calculation causing the end() function to be O(N). This made specifying the complexity of range algorithms difficult. It was not a matter of unexpected semantics AFAICT.
Until the language gives us a way to distinguish string literals from other arbitrary arrays of char, it's the best we can do. The alternative is to have generic code suddenly stop working when T==char.
What about requiring the use of as_array or as_literal in all cases?
You mean, so that raw built-in arrays would cease to be ranges at all? I guess that's a possiblity. Nobody thought about that, IIRC.
I did consider doing this with the RangeEx upgrade. I rejected the idea on the basis that the as_array idiom did not always fully optimize away on lesser compilers and the syntactic mess was unpleasant. I appreciate that many believe that native arrays should not appear in modern C++ code, but the guarantee that the array produces a pointer type from begin() and end() can be used to provide optimal implementations for contiguous pointer ranges. Without the further development of Concepts, or numerous specializations for concrete array types the use of native arrays is extremely useful in high performance protocol handling code. Therefore from my perspective it would be necessary to specify and implement the additional trait information to avoid a performance regression before this suggestion could seriously be considered. A performance regression upon the common use-case of byte ranges is out of the question.
Yes. If necessary just for char[] to minimize breakage.
In summary, I believe that the surprise is small and infrequently occurs. There are however a large number of people using many char arrays with Boost.Range. The surprise when using the library in a manner that is documented as unsupported is insufficiently important when contrasted with the impact it would have on valid code.
-- Olaf
Olaf, has provided considerable useful feedback that I am acting upon. I don't wish this clear rejection of this specific idea to put you or anyone else off of providing feedback either positive or negative. However I want to ensure users of Boost.Range that use the library for performance-critical work as I do are assured that a performance regression will not occur. There is always the possibility that I have missed a design alternative, or that I have under-estimated the inconvenience of this particular use-case. If others have a strong view or solutions that are free of runtime performance degradation then I will of course consider them. Neil Groves

On Sat, Apr 21, 2012 at 10:01 PM, Neil Groves <neil@grovescomputing.com> wrote:
I know, but what do you think the average C++ dev is expecting? IMO it's understandable (once you know about the issue), but not expected.
If there were a solution whereby the choice of the array handling where zero-termination should be handled as the end were readily available without causing overhead for the existing specified supported cases we would remove the surprise. However, since the Range documentation clearly specifies that arrays are directly supported and literal strings are not I do not feel that one should be surprised. The use-case that was provided directly explicitly constructed a range from a char[]. No implicit conversion occurred.
Hi Neil, Isn't the right-hand side of operator== implicitly converted from string literal (or array) to range? You didn't really answer the question "What do you think the average C++ dev is expecting?" (when just reading that code)
The use-case presented therefore explicitly constructed a range from a char[]. The length of the sequence was the length of the array. This is consistent with the behaviour for all arrays.
One of the main reasons that the char* with zero-termination was rejected was because of the O(N) string length calculation causing the end() function to be O(N). This made specifying the complexity of range algorithms difficult. It was not a matter of unexpected semantics AFAICT.
I don't understand. The source is an array, not a char*, so size calculation is not O(N). Besides, this cost is paid at construction time, not when calling end(), is it?
Therefore from my perspective it would be necessary to specify and implement the additional trait information to avoid a performance regression before this suggestion could seriously be considered. A performance regression upon the common use-case of byte ranges is out of the question.
Byte is unsigned char, isn't it? Or did you mean char ranges?
Yes. If necessary just for char[] to minimize breakage.
In summary, I believe that the surprise is small and infrequently occurs.
Depends on usage, IMO. In my case, string literals are used far more frequently than raw arrays.
Olaf, has provided considerable useful feedback that I am acting upon. I don't wish this clear rejection of this specific idea to put you or anyone else off of providing feedback either positive or negative. However I want to ensure users of Boost.Range that use the library for performance-critical work as I do are assured that a performance regression will not occur.
I don't mind some discussion. ;)
There is always the possibility that I have missed a design alternative, or that I have under-estimated the inconvenience of this particular use-case. If others have a strong view or solutions that are free of runtime performance degradation then I will of course consider them.
It's a shame string literals don't have a better / other type than char[]. -- Olaf

On Sat, Apr 21, 2012 at 10:32 PM, Olaf van der Spek <ml@vdspek.org> wrote:
On Sat, Apr 21, 2012 at 10:01 PM, Neil Groves <neil@grovescomputing.com> wrote:
I know, but what do you think the average C++ dev is expecting? IMO it's understandable (once you know about the issue), but not expected.
If there were a solution whereby the choice of the array handling where zero-termination should be handled as the end were readily available without causing overhead for the existing specified supported cases we would remove the surprise. However, since the Range documentation clearly specifies that arrays are directly supported and literal strings are not I do not feel that one should be surprised. The use-case that was provided directly explicitly constructed a range from a char[]. No implicit conversion occurred.
Hi Neil,
Isn't the right-hand side of operator== implicitly converted from string literal (or array) to range?
You didn't really answer the question "What do you think the average C++ dev is expecting?" (when just reading that code)
The use-case presented therefore explicitly constructed a range from a char[]. The length of the sequence was the length of the array. This is consistent with the behaviour for all arrays.
One of the main reasons that the char* with zero-termination was rejected was because of the O(N) string length calculation causing the end() function to be O(N). This made specifying the complexity of range algorithms difficult. It was not a matter of unexpected semantics AFAICT.
I don't understand. The source is an array, not a char*, so size calculation is not O(N). Besides, this cost is paid at construction time, not when calling end(), is it?
Therefore from my perspective it would be necessary to specify and implement the additional trait information to avoid a performance regression before this suggestion could seriously be considered. A performance regression upon the common use-case of byte ranges is out of the question.
Byte is unsigned char, isn't it? Or did you mean char ranges?
Yes. If necessary just for char[] to minimize breakage.
In summary, I believe that the surprise is small and infrequently occurs.
Depends on usage, IMO. In my case, string literals are used far more frequently than raw arrays.
Olaf, has provided considerable useful feedback that I am acting upon. I don't wish this clear rejection of this specific idea to put you or anyone else off of providing feedback either positive or negative. However I want to ensure users of Boost.Range that use the library for performance-critical work as I do are assured that a performance regression will not occur.
I don't mind some discussion. ;)
There is always the possibility that I have missed a design alternative, or that I have under-estimated the inconvenience of this particular use-case. If others have a strong view or solutions that are free of runtime performance degradation then I will of course consider them.
It's a shame string literals don't have a better / other type than char[].
Neil? -- Olaf

On Saturday 21 April 2012 21:01:06 Neil Groves wrote:
You mean, so that raw built-in arrays would cease to be ranges at all? I guess that's a possiblity. Nobody thought about that, IIRC.
I did consider doing this with the RangeEx upgrade. I rejected the idea on the basis that the as_array idiom did not always fully optimize away on lesser compilers and the syntactic mess was unpleasant.
Out of curiosity, what exactly was the runtime overhead of using as_array? IMHO, if a compiler is not able to inline it, it's not worth using. As for the syntactic sugar, the example Olaf provided in the beginning of this thread clearly would benefit from an explicit specification, whether a literal or an array should be used.

The current behaviour of Boost.Range is consistent with the bvehaviour of the C++11 range-based for loop, where for (char e : "hello") {} iterates 6 times, but for (char e : std::string("hello")) {} iterates 5 times. I think it's important to retain this consistency. (Whether the C++11 behaviour is desirable to begin with, is of course a different question). Regards, Nate

From: zeratul6@@@hotmail.com Date: Sun, Apr 12 :::::: +00<<
The current behaviour of Boost.Range is consistent with the bvehaviour of the C++ range-based for loop, where
for (char e : "hello") {}
iterates times, but
for (char e : std::string("hello")) {}
iterates times.
I think it's important to retain this consistency. (Whether the C++ behaviour is desirable to begin with, is of course a different question).
For a generic function that handles arrays: template < typename T, std::size_t N >int hash( const T (&x)[N] ); It should do all elements. An array of char or wchar_t (or the new C++11 character types) may not be astring, a conceptual whole; it could be a collection of independent objects, like any other element type.

On Fri, 20 Apr 2012 05:15:50 -0700, Olaf van der Spek <ml@vdspek.org> wrote:
Hi,
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
[1] #include <boost/range/iterator_range.hpp> [2] #include <string> [3][4] int main() [5] { [6] std::string s = "Olaf"; [7] boost::iterator_range<std::string::iterator> r(s); [8] bool a = r == s; [9] bool b = r == "Olaf"; [10] assert(a); [11] assert(b); [12] return 0; [14] }
I would expect both lines 8 and 9 to cause compilation failure. To someone unfamiliar with the library, it's not /immediately/ clear what the intent of either line is. If one wants to treat "s" as a range for the purpose of comparison, then one should explicitly adapt it to such. Ditto for line 9. It just makes the maintaining code that much easier. In the case of line 9, this has the additional benefit of forcing newcomers to ask what does make_iterator_range("Olaf") mean?, leading to a documentation look up that will preempt any potential surprises associated with char arrays and the library. Another note on implicit conversions, if line 8 means implicitly adapt "s" to "r"'s type and then does a comparison, then the following: [*] bool c = (s == r); by symmetry should mean adapt "r" to "s"'s type and then do a comparison, which clearly isn't the current behaviour. Even if it was, then [*] and [8] may then give different results when unicode comes into play. Which is another reason in favor of explicit conversions for both [8] and [9]. Mostafa

From: mostafa_working_away@yahoo.com Date: Sat, 21 Apr 2012 17:42:26 -0700
On Fri, 20 Apr 2012 05:15:50 -0700, Olaf van der Spek <ml@vdspek.org> wrote:
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
[1] #include <boost/range/iterator_range.hpp> [2] #include <string> [3][4] int main() [5] { [6] std::string s = "Olaf"; [7] boost::iterator_range<std::string::iterator> r(s); [8] bool a = r == s; [9] bool b = r == "Olaf"; [10] assert(a); [11] assert(b); [12] return 0; [14] }
I would expect both lines 8 and 9 to cause compilation failure. To someone unfamiliar with the library, it's not /immediately/ clear what the intent of either line is. If one wants to treat "s" as a range for the purpose of comparison, then one should explicitly adapt it to such. Ditto for line 9. It just makes the maintaining code that much easier. In the case of line 9, this has the additional benefit of forcing newcomers to ask what does make_iterator_range("Olaf") mean?, leading to a documentation look up that will preempt any potential surprises associated with char arrays and the library. [TRUNCATE] I'm guessing that Boost.Range handles the range traversal method that's acounter to the iterator traversal method. If so, I was thinking the samething Mostafa did when reading this thread: the comparisons should causecompiler errors because the types shouldn't be conceptually directlycompatible. Daryle W.

On 22/04/12 02:42, Mostafa wrote:
On Fri, 20 Apr 2012 05:15:50 -0700, Olaf van der Spek <ml@vdspek.org> wrote:
Hi,
What do you expect this code to do? Is b true or false? And why? Is this expected behaviour?
[1] #include <boost/range/iterator_range.hpp> [2] #include <string> [3][4] int main() [5] { [6] std::string s = "Olaf"; [7] boost::iterator_range<std::string::iterator> r(s); [8] bool a = r == s; [9] bool b = r == "Olaf"; [10] assert(a); [11] assert(b); [12] return 0; [14] }
I would expect both lines 8 and 9 to cause compilation failure.
There is absolutely no reason to, this code makes perfect sense. iterator_range has an overloaded operator== that allows it to compare it to any other range.
To someone unfamiliar with the library, it's not /immediately/ clear what the intent of either line is.
It is immediately clear once you know what types are ranges and what operator== on iterator_range does.
If one wants to treat "s" as a range
s is a std::string. It is already a range. There is no need to "treat it" as such.
Another note on implicit conversions, if line 8 means implicitly adapt "s" to "r"'s type and then does a comparison, then the following:
[*] bool c = (s == r);
by symmetry should mean adapt "r" to "s"'s type and then do a comparison, which clearly isn't the current behaviour. Even if it was, then [*] and [8] may then give different results when unicode comes into play. Which is another reason in favor of explicit conversions for both [8] and [9].
I have no idea what you're talking about. There is no conversion involved. s == r just does std::equal(begin(s), end(s), begin(r))
participants (9)
-
Andrey Semashev
-
Daryle Walker
-
Dave Abrahams
-
Mathias Gaunard
-
Mostafa
-
Nathan Ridge
-
Neil Groves
-
Olaf van der Spek
-
Thijs (M.A.) van den Berg