lifetime of ranges vs. iterators

newer
[threadpool] determining whether a...

Arno Schödl

29 Aug 2008 29 Aug '08

10:12 a.m.

Hello, some adaptor ranges need certain constant information about their underlying ranges. For example, iterating over a forward "difference_range", the difference between two sorted ranges, requires access to the ends of the underlying ranges. These two additional iterators are constant, they do not change when the difference_iterator is incremented. Now, where should this additional constant information be stored? a) In the difference_range, and each iterator holds a reference to the difference_range? That requires the difference_range to be alive as long as its iterators are alive. This may be difficult to ensure if the difference_range is a temporary. b) In the iterators, so the lifetime of adaptor iterators is independent of the adaptor range. Of course, one can use shared_ptrs and so on, but still the iterators are ultimately responsible to keep that information available. This initially seemed more natural to me, but if implemented naively by storing the additional iterators in the iterator itself, it leads to storage bloat: each difference_iterator needs 4 instead of 2 of its underlying iterators. When stacking difference iterators, the difference in required storage grows exponentially: instead of 2^N, you need 4^N storage... I feel that in order to solidify the range concept, one needs to make a choice whether a) or b) is "standard", because any user of ranges would need to know. Any thoughts on this? Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Show replies by date

Neil Groves

29 Aug 29 Aug

10:59 a.m.

On Fri, Aug 29, 2008 at 11:12 AM, Arno Schödl <aschoedl@think-cell.com>wrote:

...

Hello,

some adaptor ranges need certain constant information about their underlying ranges. For example, iterating over a forward "difference_range", the difference between two sorted ranges, requires access to the ends of the underlying ranges. These two additional iterators are constant, they do not change when the difference_iterator is incremented.

Now, where should this additional constant information be stored?

a) In the difference_range, and each iterator holds a reference to the difference_range? That requires the difference_range to be alive as long as its iterators are alive. This may be difficult to ensure if the difference_range is a temporary.

Option a, is the solution that I prefer due to the lower memory overhead. I am of the opinion that there is no need to provide a uniform lifetime arrangement since it would often make optimal implementations impossible. The difference_range should be implemented with the requirement that it be held for as long as the iterators. The iterator invalidation of the standard containers being different for same named operations on different classes is perhaps an appropriate analogy.

...

b) In the iterators, so the lifetime of adaptor iterators is independent of the adaptor range. Of course, one can use shared_ptrs and so on, but still the iterators are ultimately responsible to keep that information available.

This initially seemed more natural to me, but if implemented naively by storing the additional iterators in the iterator itself, it leads to storage bloat: each difference_iterator needs 4 instead of 2 of its underlying iterators. When stacking difference iterators, the difference in required storage grows exponentially: instead of 2^N, you need 4^N storage...

I feel that in order to solidify the range concept, one needs to make a choice whether a) or b) is "standard", because any user of ranges would need to know. Any thoughts on this?

I don't think this needs to be unified. I think there is a preference for range lifetimes to be as forgiving as possible, and for most ranges the range lifetime may be less than the underlying iterators. This is very convenient for the vast majority of cases. I also see that there are times when this is too expensive, and then a range implemented with an alternative arrangement is the logical choice. In practice it does not appear to measurably effect the number of defects in software.

...

Arno

-- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director

think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

I hope this helps. Best wishes, Neil Groves

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams

11 a.m.

on Fri Aug 29 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

Hello,

some adaptor ranges need certain constant information about their underlying ranges. For example, iterating over a forward "difference_range", the difference between two sorted ranges, requires access to the ends of the underlying ranges. These two additional iterators are constant, they do not change when the difference_iterator is incremented.

Now, where should this additional constant information be stored?

a) In the difference_range, and each iterator holds a reference to the difference_range? That requires the difference_range to be alive as long as its iterators are alive. This may be difficult to ensure if the difference_range is a temporary.

b) In the iterators, so the lifetime of adaptor iterators is independent of the adaptor range. Of course, one can use shared_ptrs and so on, but still the iterators are ultimately responsible to keep that information available.

This initially seemed more natural to me, but if implemented naively by storing the additional iterators in the iterator itself, it leads to storage bloat: each difference_iterator needs 4 instead of 2 of its underlying iterators. When stacking difference iterators, the difference in required storage grows exponentially: instead of 2^N, you need 4^N storage...

I feel that in order to solidify the range concept, one needs to make a choice whether a) or b) is "standard", because any user of ranges would need to know. Any thoughts on this?

Aren't standard containers supposed to satisfy the Range concepts? When the container is destroyed, its iterators are invalidated, right? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Neil Groves

11:05 a.m.

...

Aren't standard containers supposed to satisfy the Range concepts? When the container is destroyed, its iterators are invalidated, right?

Yes, containers satisfy Range concepts, but this question is more related to the opposite relationship. There is no need for a range destruction to invalidate the iterators, and this leads to simpler code for example when creating a temporary range to pass to a range algorithm. Dave Abrahams

...

BoostPro Computing http://www.boostpro.com

Neil Groves

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams

1:56 p.m.

on Fri Aug 29 2008, "Neil Groves" <neil-AT-grovescomputing.com> wrote:

...

...
Aren't standard containers supposed to satisfy the Range concepts? When the container is destroyed, its iterators are invalidated, right?

Yes, containers satisfy Range concepts, but this question is more related to the opposite relationship. There is no need for a range destruction to invalidate the iterators, and this leads to simpler code for example when creating a temporary range to pass to a range algorithm.

So? No algorithm could possibly _rely_ on the fact that the iterators will be invalidated, so what does it matter? You can do whatever's most efficient. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

12:50 p.m.

For iterators, the make_ helpers are great: vector<int> vecnA; vector<int> vecnB; DoSomethingWithRangeThatStoresIterators( make_difference_iterator( vecnA.begin(), vecnA,end(), vecnB.begin(), vecnB.end() ), make_difference_iterator( vecnA.end(), vecnA,end(), vecnB.end(), vecnB.end() ) ); The equivalent with ranges, supposedly easier, would now look like this: vector<int> vecnA; vector<int> vecnB; difference_range< int, int > diffrng( vecnA, vecnB ); // watch out with scope DoSomethingWithRangeThatStoresIterators( diffrng ); as opposed to the much nicer: vector<int> vecnA; vector<int> vecnB; DoSomethingWithRangeThatStoresIterators( make_difference_iterator( vecnA, vecnB ) ); If we drive this stacking to one or more levels, things pretty quickly become pretty ugly. Do we really want that? Arno -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Friday, August 29, 2008 1:01 PM To: boost@lists.boost.org Subject: Re: [boost] lifetime of ranges vs. iterators on Fri Aug 29 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

Hello,

some adaptor ranges need certain constant information about their underlying ranges. For example, iterating over a forward "difference_range", the difference between two sorted ranges, requires access to the ends of the underlying ranges. These two additional iterators are constant, they do not change when the difference_iterator is incremented.

Now, where should this additional constant information be stored?

a) In the difference_range, and each iterator holds a reference to the difference_range? That requires the difference_range to be alive as long as its iterators are alive. This may be difficult to ensure if the difference_range is a temporary.

b) In the iterators, so the lifetime of adaptor iterators is independent of the adaptor range. Of course, one can use shared_ptrs and so on, but still the iterators are ultimately responsible to keep that information available.

This initially seemed more natural to me, but if implemented naively by storing the additional iterators in the iterator itself, it leads to storage bloat: each difference_iterator needs 4 instead of 2 of its underlying iterators. When stacking difference iterators, the difference in required storage grows exponentially: instead of 2^N, you need 4^N storage...

I feel that in order to solidify the range concept, one needs to make a choice whether a) or b) is "standard", because any user of ranges would need to know. Any thoughts on this?

Aren't standard containers supposed to satisfy the Range concepts? When the container is destroyed, its iterators are invalidated, right? -- Dave Abrahams BoostPro Computing http://www.boostpro.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Neil Groves

12:56 p.m.

On Fri, Aug 29, 2008 at 1:50 PM, Arno Schödl <aschoedl@think-cell.com>wrote:

...

For iterators, the make_ helpers are great:

vector<int> vecnA; vector<int> vecnB;

DoSomethingWithRangeThatStoresIterators( make_difference_iterator( vecnA.begin(), vecnA,end(), vecnB.begin(), vecnB.end() ), make_difference_iterator( vecnA.end(), vecnA,end(), vecnB.end(), vecnB.end() ) );

The equivalent with ranges, supposedly easier, would now look like this:

vector<int> vecnA; vector<int> vecnB;

difference_range< int, int > diffrng( vecnA, vecnB ); // watch out with scope DoSomethingWithRangeThatStoresIterators( diffrng );

as opposed to the much nicer:

vector<int> vecnA; vector<int> vecnB;

DoSomethingWithRangeThatStoresIterators( make_difference_iterator( vecnA, vecnB ) );

If we drive this stacking to one or more levels, things pretty quickly become pretty ugly. Do we really want that?

No, we don't want that at all. Apologies for the shameless plug, but you might want to look at the range adaptors that use the '|' operator in the Boost.RangeEx proposal / update in the vault at: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=range_ex.zip&directory=Algorithms The range adaptors and the pipe operator allow infix composition of range adaptors for the very reason you suggest. Neil Groves

...

Arno

Arno Schödl

1:13 p.m.

I have seen your | operator. It is o.k. for unary things, like rng | filter( predicate ), but for binary things, it is a bit weird: rngA | difference(rngB) I find the alternatives clearer: rngA - rngB would be nice (but requires concept checking), or difference( rngA, rngB ). But regardless of notation, doesn't this suffer from the same problem that these objects are temporaries? Arno -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Neil Groves Sent: Friday, August 29, 2008 2:56 PM To: boost@lists.boost.org Subject: Re: [boost] lifetime of ranges vs. iterators On Fri, Aug 29, 2008 at 1:50 PM, Arno Schödl <aschoedl@think-cell.com>wrote:

...

For iterators, the make_ helpers are great:

vector<int> vecnA; vector<int> vecnB;

DoSomethingWithRangeThatStoresIterators( make_difference_iterator( vecnA.begin(), vecnA,end(), vecnB.begin(), vecnB.end() ), make_difference_iterator( vecnA.end(), vecnA,end(), vecnB.end(), vecnB.end() ) );

The equivalent with ranges, supposedly easier, would now look like this:

vector<int> vecnA; vector<int> vecnB;

difference_range< int, int > diffrng( vecnA, vecnB ); // watch out with scope DoSomethingWithRangeThatStoresIterators( diffrng );

as opposed to the much nicer:

vector<int> vecnA; vector<int> vecnB;

DoSomethingWithRangeThatStoresIterators( make_difference_iterator( vecnA, vecnB ) );

If we drive this stacking to one or more levels, things pretty quickly become pretty ugly. Do we really want that?

...

Arno

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Neil Groves

1:25 p.m.

On Fri, Aug 29, 2008 at 2:13 PM, Arno Schödl <aschoedl@think-cell.com>wrote:

...

I have seen your | operator. It is o.k. for unary things, like rng | filter( predicate ), but for binary things, it is a bit weird:

rngA | difference(rngB)

I find the alternatives clearer: rngA - rngB would be nice (but requires concept checking), or difference( rngA, rngB ).

Indeed I can definately see your point about binary operations.

...

But regardless of notation, doesn't this suffer from the same problem that these objects are temporaries?

Yes, it does not help at all with the 'difference' issue. Would you be able to supply code for the difference solution you have proposed? I would be happy to look at it. It would help me understand the problem more concretely. If you have specific suggestions for changes to range / range_ex then I would be happy to consider them. With my currently level of non-understanding of the actual difference algorithm I wonder if the result is not more appropriately a model of a Container rather than a Range? Neil

...

Arno

Giovanni Piero Deretta

1:27 p.m.

On Fri, Aug 29, 2008 at 3:13 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

I have seen your | operator. It is o.k. for unary things, like rng | filter( predicate ), but for binary things, it is a bit weird:

rngA | difference(rngB)

I find the alternatives clearer: rngA - rngB would be nice (but requires concept checking), or difference( rngA, rngB ).

What about Infix a-la FC++? (rngA ^difference^ rngB) I love the syntax, but maybe this is too much operator abuse.

...

But regardless of notation, doesn't this suffer from the same problem that these objects are temporaries?

Arno Schödl

2:12 p.m.

Indeed, as long as the range does not escape, everything is fine. But here we are in trouble: some_stacked_adaptor_range<...> rng( make_difference_range( vecnA, vecnB ) ); Again, notation makes no difference, but Neil's idea, which I fully endorse, namely easy stacking, makes the problem all the more apparent: some_stacked_adaptor_range<...> rng( vecnA | difference( vecnB ) ); // wrong What makes this really annoying is that there is an easy way out: Wouldn't it be more natural to represent a forward range as a single iterator, with an extra .at_end() check? This would eliminate the problem, because a difference_range would only need two such iterators, not four. Stacking them naively would have 2^N storage growth, rather than 4^N, and we still don't need to worry about range lifetime. For other classes of iterators, the situation is similar. bidirectional ranges have at_end() and at_begin(), and random_access_ranges a size() and an operator[]. I know that overthrowing such an established concept as ranges is not terribly popular, but I don't see any other way to combine easy stacking with no worry about lifetimes. Arno -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Giovanni Piero Deretta Sent: Friday, August 29, 2008 3:28 PM To: boost@lists.boost.org Subject: Re: [boost] lifetime of ranges vs. iterators On Fri, Aug 29, 2008 at 3:13 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

I have seen your | operator. It is o.k. for unary things, like rng | filter( predicate ), but for binary things, it is a bit weird:

rngA | difference(rngB)

I find the alternatives clearer: rngA - rngB would be nice (but requires concept checking), or difference( rngA, rngB ).

What about Infix a-la FC++? (rngA ^difference^ rngB) I love the syntax, but maybe this is too much operator abuse.

...

But regardless of notation, doesn't this suffer from the same problem that these objects are temporaries?

I'm still missing something: where is the problem in: vector<int> vecnA; vector<int> vecnB; DoSomethingWithRangeThatStoresIterators( difference_range< int, int > diffrng( vecnA, vecnB ) ); This works, as long as the range doesn't escape from DSWRTSI, or a copy is made. [of course a make_difference_range would make the code much cleaner] -- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

2:32 p.m.

On Fri, Aug 29, 2008 at 4:12 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

Indeed, as long as the range does not escape, everything is fine. But here we are in trouble:

some_stacked_adaptor_range<...> rng( make_difference_range( vecnA, vecnB ) );

Again, notation makes no difference, but Neil's idea, which I fully endorse, namely easy stacking, makes the problem all the more apparent:

some_stacked_adaptor_range<...> rng( vecnA | difference( vecnB ) ); // wrong

What makes this really annoying is that there is an easy way out: Wouldn't it be more natural to represent a forward range as a single iterator, with an extra .at_end() check? This would eliminate the problem, because a difference_range would only need two such iterators, not four. Stacking them naively would have 2^N storage growth, rather than 4^N, and we still don't need to worry about range lifetime.

Ok, now I understand. The lack of auto and decltype usually means that you use range adaptors in a strictly LIFO way, i.e. you usually never have named range adaptors, and iterator validity is never an issue as the scope of the wrapping range is always a subset of the wrapped range. Anyways, the solution is simple: if you want your adaptor to survive the original wrapped range, you have to store a copy of the wrapped range in the adaptor. As an added benefit, the adaptor size will be the smallest possible for deep sequences of stacked adaptors. Of course this means that you have to wrap real containers around boost::range to prevent expensive copies. (sometimes explicitly specifying the adaptor template argument as a reference, instead of relying on a make_*_range, will work). BTW, This mimics the situation with boost.bind and lambda, were the default capture is by value and you have to use ref/cref to explicitly store by reference.

...

For other classes of iterators, the situation is similar. bidirectional ranges have at_end() and at_begin(), and random_access_ranges a size() and an operator[].

I know that overthrowing such an established concept as ranges is not terribly popular, but I don't see any other way to combine easy stacking with no worry about > lifetimes.

In most cases (i.e. every time a default constructed iterator is not a valid end iterator), this means that an iterator must store the end point in addition to the initial point. Thus the iterator would actually be a range. You might as well make that explicit. P.S. please, don't top post. -- gpd

David Abrahams

2:48 p.m.

on Fri Aug 29 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

Wouldn't it be more natural to represent a forward range as a single iterator, with an extra .at_end() check? This would eliminate the problem, because a difference_range would only need two such iterators, not four.

As part of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1873.html one of the ideas we discussed extensively was allowing the end iterator of a range to be a different type from the begin iterator. That would allow a sort of "universal end iterator" to be used in situations such as this: struct end { // These definitions are only useful for infinite ranges template <class T> bool operator==(T const&) { return false; } template <class T> bool operator=!(T const&) { return true; } }; Effectively the data representation of the begin iterator would be the same as what you're proposing, plus comparison operators against the universal end type above, but the end iterator type could be empty (and optimized away). That allows a generic description of algorithms that are optimally efficient on pairs of pointers, but could simplify many classes of iterator (ifstream_iterators, for example) where every iterator needs all the information required to detect whether it can advance anyway.

...

Stacking them naively would have 2^N storage growth, rather than 4^N, and we still don't need to worry about range lifetime.

I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed.

...

For other classes of iterators, the situation is similar. bidirectional ranges have at_end() and at_begin(), and random_access_ranges a size() and an operator[].

Bleah (sorry). Iterators in general shouldn't be burdened with the requirement to know the limits of the sequence, and if you go back and review the class of problems that iterators were "designed" to address, I think you'll see why. Just take a look at the classic quicksort algorithm, for example, or any other divide-and-conquer algorithm. I put "designed" in quotes because concepts are discovered, not designed, but that's a whole different topic.

...

I know that overthrowing such an established concept as ranges is not terribly popular, but I don't see any other way to combine easy stacking with no worry about lifetimes.

I'm not sure this is as revolutionary as you think. Every other language I've looked at has a get_next()/at_end() iterator abstraction, which is a lot like what you're proposing. My advice: make sure you keep in mind the whole problem domain. You can't understand the reason that C++ iterators are unlike those in other languages without looking at the algorithms. Generic Programming -- it's about the algorithms, baby! -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

3:11 p.m.

On Fri, Aug 29, 2008 at 4:48 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Fri Aug 29 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...
Wouldn't it be more natural to represent a forward range as a single iterator, with an extra .at_end() check? This would eliminate the problem, because a difference_range would only need two such iterators, not four.

As part of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1873.html one of the ideas we discussed extensively was allowing the end iterator of a range to be a different type from the begin iterator.

<snip>

...

Effectively the data representation of the begin iterator would be the same as what you're proposing, plus comparison operators against the universal end type above, but the end iterator type could be empty (and optimized away). That allows a generic description of algorithms that are optimally efficient on pairs of pointers, but could simplify many classes of iterator (ifstream_iterators, for example) where every iterator needs all the information required to detect whether it can advance anyway.

Nice!

...

...
Stacking them naively would have 2^N storage growth, rather than 4^N, and we still don't need to worry about range lifetime.

I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed.

I think that the OP is talking about a different problem: let say you want to name the result of stacked applications of range adaptors: template<typename Range> void my_algo(Range& r) { auto stacked_range = r | filtered(filter) | mapped(mapper); // use stacked_range } This may lead iterator lifetime problems, not because r iterators may be invalid, but because the iterators of the range returned by filtered(...) may become invalid as that temporary has been destructed. There are many solutions: The easiest is splitting apart the expression, but is ugly. Then you could use expression templates to make the whole pipe expression return a range only at the end. Finally, both filtered and mapped could guarantee that the iterators are not invalidated when the range is destructed, but that may prevent some space optimizations (like storing the predicate inside the range instead of the iterator)).

...

...
For other classes of iterators, the situation is similar. bidirectional ranges have at_end() and at_begin(), and random_access_ranges a size() and an operator[].

Bleah (sorry). Iterators in general shouldn't be burdened with the requirement to know the limits of the sequence, and if you go back and review the class of problems that iterators were "designed" to address, I think you'll see why. Just take a look at the classic quicksort algorithm, for example, or any other divide-and-conquer algorithm.

I put "designed" in quotes because concepts are discovered, not designed, but that's a whole different topic.

...
I know that overthrowing such an established concept as ranges is not terribly popular, but I don't see any other way to combine easy stacking with no worry about lifetimes.

I'm not sure this is as revolutionary as you think. Every other language I've looked at has a get_next()/at_end() iterator abstraction, which is a lot like what you're proposing. My advice: make sure you keep in mind the whole problem domain. You can't understand the reason that C++ iterators are unlike those in other languages without looking at the algorithms.

BTW, ranges are equivalent to GoF iterators: you just spell 'at_end' as 'empty(r)'; 'next()' as 'r = range(next(begin(r)), end(r))' (equivalently: 'r = rest(r))'; and '.get()' as '*begin(r)'; The nice thing about ranges is that you can take them apart into their iterator components when you need the extra flexibility. -- gpd

David Abrahams

11:51 p.m.

on Fri Aug 29 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

On Fri, Aug 29, 2008 at 4:48 PM, David Abrahams <dave@boostpro.com> wrote:

...
I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed.

I think that the OP is talking about a different problem: let say you want to name the result of stacked applications of range adaptors:

template<typename Range> void my_algo(Range& r) { auto stacked_range = r | filtered(filter) | mapped(mapper); // use stacked_range }

This may lead iterator lifetime problems, not because r iterators may be invalid, but because the iterators of the range returned by filtered(...) may become invalid as that temporary has been destructed. There are many solutions:

The easiest is splitting apart the expression, but is ugly.

Then you could use expression templates to make the whole pipe expression return a range only at the end.

Finally, both filtered and mapped could guarantee that the iterators are not invalidated when the range is destructed, but that may prevent some space optimizations (like storing the predicate inside the range instead of the iterator)).

Or you could write the representation of the range in such a way that it didn't contain an exact representation of the iterators. When you assemble a range from two iterators, store the common information separately, and reassemble the complete iterators only when they are requested. When adapting iterators you'd do the same deconstruction/re-construction. You could tack on some runtime error-checking for good measure.

...

...
I'm not sure this is as revolutionary as you think. Every other language I've looked at has a get_next()/at_end() iterator abstraction, which is a lot like what you're proposing. My advice: make sure you keep in mind the whole problem domain. You can't understand the reason that C++ iterators are unlike those in other languages without looking at the algorithms.

BTW, ranges are equivalent to GoF iterators:

Well, they're informationally equivalent...

...

you just spell 'at_end' as 'empty(r)'; 'next()' as 'r = range(next(begin(r)), end(r))' (equivalently: 'r = rest(r))'; and '.get()' as '*begin(r)';

...but that is so ugly as to make them non-equivalent abstractions in my book. And that's a good thing for ranges ;-)

...

The nice thing about ranges is that you can take them apart into their iterator components when you need the extra flexibility.

IMO ranges will be good for describing transformations at a high level, but algorithms, for the most part will, need that flexibility. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

30 Aug 30 Aug

12:09 a.m.

On Sat, Aug 30, 2008 at 1:51 AM, David Abrahams <dave@boostpro.com> wrote:

...

...
I'm not sure this is as revolutionary as you think. Every other

...
language I've looked at has a get_next()/at_end() iterator abstraction, which is a lot like what you're proposing. My advice: make sure you keep in mind the whole problem domain. You can't understand the reason that C++ iterators are unlike those in other languages without looking at the algorithms.

BTW, ranges are equivalent to GoF iterators:

Well, they're informationally equivalent...

...
you just spell 'at_end' as 'empty(r)'; 'next()' as 'r = range(next(begin(r)), end(r))' (equivalently: 'r = rest(r))'; and '.get()' as '*begin(r)';

...but that is so ugly as to make them non-equivalent abstractions in my book. And that's a good thing for ranges ;-)

when I need a simple for loop (and can't reach for BOOST_FOREACH), the syntax I sometimes use is: for(range<container> r (c); r; r = rest(r)) { .... } Which isn't that bad. You could even write the increment as ++r.

...

...
The nice thing about ranges is that you can take them apart into their iterator components when you need the extra flexibility.

IMO ranges will be good for describing transformations at a high level, but algorithms, for the most part will, need that flexibility.

I have had the same experience, so agree 200%. -- gpd

Giovanni Piero Deretta

5:12 a.m.

On Sat, Aug 30, 2008 at 1:51 AM, David Abrahams <dave@boostpro.com> wrote:

...

on Fri Aug 29 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
On Fri, Aug 29, 2008 at 4:48 PM, David Abrahams <dave@boostpro.com> wrote:

...
I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed.

I think that the OP is talking about a different problem: let say you want to name the result of stacked applications of range adaptors:

template<typename Range> void my_algo(Range& r) { auto stacked_range = r | filtered(filter) | mapped(mapper); // use stacked_range }

This may lead iterator lifetime problems, not because r iterators may be invalid, but because the iterators of the range returned by filtered(...) may become invalid as that temporary has been destructed. There are many solutions:

The easiest is splitting apart the expression, but is ugly.

Then you could use expression templates to make the whole pipe expression return a range only at the end.

Finally, both filtered and mapped could guarantee that the iterators are not invalidated when the range is destructed, but that may prevent some space optimizations (like storing the predicate inside the range instead of the iterator)).

Or you could write the representation of the range in such a way that it didn't contain an exact representation of the iterators. When you assemble a range from two iterators, store the common information separately, and reassemble the complete iterators only when they are requested. When adapting iterators you'd do the same deconstruction/re-construction. You could tack on some runtime error-checking for good measure.

Hum, this would means that every range needs to know how to construct and deconstruct iterators of other ranges, at least if it wants to guarantee that the minimum space possible is used. It seems a bit complex, and there would be a need to come up with some non obvious protocol to do that (some sort of compile time introspection on ranges). Is it really worth it? Are you advocating it, or it was just another solution to consider? -- gpd

Arno Schödl

9:26 a.m.

I am following up on Giovanni's idea of storing ranges by value. To minimize the impact on existing code, and to make the frequent case of wrapping an adaptor range around another container like vector or set easy, we could shift the burden to specify which ranges should be stored by reference and which by value onto the ranges themselves. A template container_reference would designate a reference to the underlying container that holds the data. The data is assumed to remain valid, but any adaptors around it may go out of scope. Storage by reference would be default, so everything works as it did so far, but any wrapped ranges may not go out of scope: template< class T > struct container_reference { typedef T& type; }; Adaptor ranges could decide to store themselves by value: template< class RngA, class RngB > struct container_reference< difference_range<RngA, RngB> > { typedef T type; }; Adaptors would use it to store a reference to the underlying data: template< class RngA, class RngB > class difference_range { typename container_reference< RngA >::type m_rngA; typename container_reference< RngB >::type m_rngB; difference_range( RngA const& rngA, RngA const& rngB ): : m_rngA( rngA ), m_rngB( rngB ) { ... } }; What do you think? Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

6:17 p.m.

on Sat Aug 30 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

Hum, this would means that every range needs to know how to construct and deconstruct iterators of other ranges, at least if it wants to guarantee that the minimum space possible is used.

It would be done with associated types and functions of the iterators. Few iterators need to store that kind of redundant information, but when you build an iterator type that *does*, it would be deemed "polite" to specialize an appropriate trait and/or define the corresponding overloads to allow it to be decomposed/re-composed. I've been itching to rework Boost.Iterator recently, and this could make a very nice enhancement.

...

It seems a bit complex, and there would be a need to come up with some non obvious protocol to do that (some sort of compile time introspection on ranges).

Not on ranges; just on iterators.

...

Is it really worth it? Are you advocating it, or it was just another solution to consider?

I think I may be advocating it :-) I think there may be some relationship between this and hierarchical iterator representation, so I'd like to explore it further from that angle as well. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Watanabe

9:18 p.m.

AMDG David Abrahams wrote:

...

on Sat Aug 30 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
Hum, this would means that every range needs to know how to construct and deconstruct iterators of other ranges, at least if it wants to guarantee that the minimum space possible is used.

It would be done with associated types and functions of the iterators. Few iterators need to store that kind of redundant information, but when you build an iterator type that *does*, it would be deemed "polite" to specialize an appropriate trait and/or define the corresponding overloads to allow it to be decomposed/re-composed.

I've been itching to rework Boost.Iterator recently, and this could make a very nice enhancement.

How would this work: concept IteratorWithEnd<typename Iterator> { Iterator get_end(Iterator); Iterator set_end(Iterator, Iterator); } Taking filter_iterator, for example, this would allow the space to be minimized, with nested filter_iterators. (untested) template<IteratorWithEnd Iterator, class F> class filter_iterator<Iterator, F> { public: //... filter_iterator(Iterator begin, Iterator end, F f) : iter(set_end(begin, end)), f(f) {} filter_iterator& operator++() { ++iter; while(iter != get_end(iter) && !f(*iter)) ++iter; return(*this); } friend Iterator get_end(const filter_iterator& self) { return(filter_iterator(get_end(self.iter), get_end(self.iter), self.f)) } friend Iterator set_end(const filter_iterator& self, const filter_iterator& other) { return(filter_iterator(self.iter, get_end(other.iter), f)); } private: Iterator iter; F f; }; In Christ, Steven Watanabe

David Abrahams

31 Aug 31 Aug

12:24 a.m.

on Sat Aug 30 2008, Steven Watanabe <watanabesj-AT-gmail.com> wrote:

...

How would this work:

template<IteratorWithEnd Iterator, class F> class filter_iterator<Iterator, F> { public: //... filter_iterator(Iterator begin, Iterator end, F f) : iter(set_end(begin,end)), f(f) {}

filter_iterator& operator++() { ++iter; while(iter != get_end(iter) && !f(*iter)) ++iter; return *this; }

friend Iterator get_end(const filter_iterator& self) { return filter_iterator( get_end(self.iter), get_end(self.iter), self.f); }

friend Iterator set_end( const filter_iterator& self, const filter_iterator& other) { return filter_iterator(self.iter, get_end(other.iter), f); }

private: Iterator iter; F f; };

Hi Steven, Say for example Iterator is strided_iterator<int>. Then the return type of set_end above is strided_iterator<int>, right? So how will its body typecheck? Seems like this needs a little more thought. PS: I suggest you use more linebreaks, at least in your postings; code with linebreaks inserted by mailers is even harder to read than code with too few linebreaks ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Watanabe

12:39 a.m.

AMDG David Abrahams wrote:

...

...
friend Iterator get_end(const filter_iterator& self) { return filter_iterator( get_end(self.iter), get_end(self.iter), self.f); }

friend Iterator set_end( const filter_iterator& self, const filter_iterator& other);

Hi Steven,

Say for example Iterator is strided_iterator<int>. Then the return type of set_end above is strided_iterator<int>, right? So how will its body typecheck? Seems like this needs a little more thought.

Oops. I intended the return type to be filter_iterator. In Christ, Steven Watanabe

David Abrahams

4:15 a.m.

on Sat Aug 30 2008, Steven Watanabe <watanabesj-AT-gmail.com> wrote:

...

AMDG

David Abrahams wrote:

...
...
friend Iterator get_end(const filter_iterator& self) { return filter_iterator( get_end(self.iter), get_end(self.iter), self.f); } friend Iterator set_end( const filter_iterator& self, const filter_iterator& other);

Hi Steven,

Say for example Iterator is strided_iterator<int>. Then the return type of set_end above is strided_iterator<int>, right? So how will its body typecheck? Seems like this needs a little more thought.

Oops. I intended the return type to be filter_iterator.

Okay then, I take it you meant something like: template<IteratorWithEnd Iterator, class F> class filter_iterator<Iterator, F> { public: //... filter_iterator(Iterator begin, Iterator end, F f) : iter(set_end(begin,end)), f(f) {} filter_iterator& operator++() { ++iter; while(iter != get_end(iter) && !f(*iter)) ++iter; return *this; } friend filter_iterator get_end(const filter_iterator& self) { return filter_iterator( get_end(self.iter), get_end(self.iter), self.f); } friend filter_iterator set_end( const filter_iterator& self, const filter_iterator& other) { return filter_iterator(self.iter, get_end(other.iter), f); } private: Iterator iter; F f; }; ?? I'm still not seeing it. In set_end you have: return filter_iterator(self.iter, get_end(other.iter), f); Isn't the result of get_end going to be a filter_iterator? The 2nd argument to filter_iterator's ctor is Iterator, though. Maybe if you wrote out what you were trying to say in english it would be easier for me to understand your goal here. One thing that seems missing from your design in general is some way of getting a type that represents the common information that must be carried by any two iterators over the same range. If you don't provide a way to factor that information out, I don't see a way to avoid redundant storage. I was thinking of something more like this: // Gimme a better name, please! The abstraction is a thing that // is always used in pairs, where each instance contains some // redundant information in common. auto concept Factorable<typename T> { typename common_data = void; typename unique_data = T; common_data common(T) {} unique_data unique(T x) { return x; } T reconstruct(common_data, unique_data x) { return x; } }; template <class I, class F> concept_map Factorable<filter_iterator<I,F> > { typedef compressed_pair<I,F> common_data typedef I unique_data; common_data common(filter_iterator<I,F> p) { return common_data(p.end(), p.func()); } unique_data common(filter_iterator<I,F> p) { return p.base(); } filter_iterator<I,F> reconstruct(common_data c, unique_data u) { return filter_iterator<I,F>( c.first(), c.second(), u); } }; -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

2:09 p.m.

...

// Gimme a better name, please! The abstraction is a thing that // is always used in pairs, where each instance contains some // redundant information in common. auto concept Factorable<typename T> { typename common_data = void; typename unique_data = T;

common_data common(T) {} unique_data unique(T x) { return x; } T reconstruct(common_data, unique_data x) { return x; } };

How is the separation of common_data and unique_data different from a separation of ranges and iterators? If iterators of ranges can rely on their range to exist, this is where common data like functors, end iterators etc. can be stored. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

7:19 p.m.

on Sun Aug 31 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
// Gimme a better name, please! The abstraction is a thing that // is always used in pairs, where each instance contains some // redundant information in common. auto concept Factorable<typename T> { typename common_data = void; typename unique_data = T;

common_data common(T) {} unique_data unique(T x) { return x; } T reconstruct(common_data, unique_data x) { return x; } };

How is the separation of common_data and unique_data different from a separation of ranges and iterators? If iterators of ranges can rely on their range to exist, this is where common data like functors, end iterators etc. can be stored.

Naturally the point is to store the common data once when you need to store more than one iterator to the same sequence -- in a range, for example. If you're asking why I bother reconstituting the whole iterator out of its parts, instead of simply referring to the common_data stored in the range: an adapted iterator is a useful concept regardless of whether anyone is using a range abstraction. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

9:59 p.m.

...

...
How is the separation of common_data and unique_data different from a separation of ranges and iterators? If iterators of ranges can rely on their range to exist, this is where common data like functors, end iterators etc. can be stored.

...

Naturally the point is to store the common data once when you need to store more than one iterator to the same sequence -- in a range, for example.

...

If you're asking why I bother reconstituting the whole iterator out of its parts, instead of simply referring to the common_data stored in the range: an adapted iterator is a useful concept regardless of whether anyone is using a range abstraction.

You could provide an adapted_iterator and also an adapted_range. If we subscribe to the rule that ranges must stay valid for their iterators to be valid, the adapted_range::iterator can use the common data stored in the range, while the adapted_iterator stores the common data itself. Both could even be derived from the same source code. Do you then still need a factored iterator? Or do you want to avoid people having to use the range abstraction? If you pass iterator pairs to algorithms instead of ranges, at least this parameter passing would have to pass the common data redundantly, even if inside the algorithm, say when many iterators have to be stored, the common data is stripped and stored separately. -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

1 Sep 1 Sep

3:24 a.m.

on Sun Aug 31 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
How is the separation of common_data and unique_data different from a separation of ranges and iterators? If iterators of ranges can rely on their range to exist, this is where common data like functors, end iterators etc. can be stored.

...
Naturally the point is to store the common data once when you need to store more than one iterator to the same sequence -- in a range, for example.

...
If you're asking why I bother reconstituting the whole iterator out of its parts, instead of simply referring to the common_data stored in the range: an adapted iterator is a useful concept regardless of whether anyone is using a range abstraction.

You could provide an adapted_iterator and also an adapted_range.

My point is that the adapted_range would then have semantically equivalent iterators with a different representation (and an extra indirection), which seems like a waste.

...

If we subscribe to the rule that ranges must stay valid for their iterators to be valid,

I don't. I do subscribe to the rule that generic code cannot afford to destroy an arbitrary range while its iterators are still in use.

...

the adapted_range::iterator can use the common data stored in the range, while the adapted_iterator stores the common data itself. Both could even be derived from the same source code.

Yeah, that's still a lot of coding effort.

...

Do you then still need a factored iterator?

You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

...

Or do you want to avoid people having to use the range abstraction?

I certainly don't want to force it on them.

...

If you pass iterator pairs to algorithms instead of ranges, at least this parameter passing would have to pass the common data redundantly, even if inside the algorithm, say when many iterators have to be stored, the common data is stripped and stored separately.

Of course that's understood. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

6:57 a.m.

...

...
If we subscribe to the rule that ranges must stay valid for their iterators to be valid,

...

I don't. I do subscribe to the rule that generic code cannot afford to destroy an arbitrary range while its iterators are still in use.

Isn't that what I said? There may be iterators that work without their ranges, but in general they don't.

...

...
the adapted_range::iterator can use the common data stored in the range, while the adapted_iterator stores the common data itself. Both could even be derived from the same source code.

...

Yeah, that's still a lot of coding effort.

I think you could write it generically, a la iterator_facade/adaptor, so it is a one-time fixed cost.

...

...
Do you then still need a factored iterator?

...

You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

...

...
Or do you want to avoid people having to use the range abstraction?

...

I certainly don't want to force it on them.

Ok, now I understand. The debate is about primacy of ranges or iterators. You propose that iterators stay the primary vehicle and to convert them to/from ranges by stripping the common information. But that would mean that there is no "lean" iterator, all iterators would contain the redundant information. When stacking N difference_ranges, the size difference between "fat" and "lean" iterators is 2^N. Thus in fully generic code where you don't know anything about the stacking depth, even generating a single fat iterator carries a potentially exponential penalty. This fact makes me think that range is not merely a fancy name for a pair of iterators but a concept in its own right between containers and iterators. Generic algorithms must be written on the basis of ranges rather than iterators or take a significant performance hit. -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

4:51 p.m.

on Mon Sep 01 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
If we subscribe to the rule that ranges must stay valid for their iterators to be valid,

...
I don't. I do subscribe to the rule that generic code cannot afford to destroy an arbitrary range while its iterators are still in use.

Isn't that what I said?

No, but it might be what you meant.

...

There may be iterators that work without their ranges, but in general they don't.

...
...
the adapted_range::iterator can use the common data stored in the range, while the adapted_iterator stores the common data itself. Both could even be derived from the same source code.

...
Yeah, that's still a lot of coding effort.

I think you could write it generically, a la iterator_facade/adaptor, so it is a one-time fixed cost.

I'm not sure if it's worth the effort.

...

...
...
Do you then still need a factored iterator?

...
You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

...
...
Or do you want to avoid people having to use the range abstraction?

...
I certainly don't want to force it on them.

Ok, now I understand. The debate is about primacy of ranges or iterators.

I'm not sure I agree.

...

You propose that iterators stay the primary vehicle

Let's say, I propose that they are first-class citizens that work without an underlying Range object. char* buffer=malloc(lotsabytes); How am I going to operate on that if there needs to be an underlying range object? There isn't one; the iterators stand alone. (Ranges can be first-class citizens too if you like).

...

and to convert them to/from ranges by stripping the common information.

As an optimization.

...

But that would mean that there is no "lean" iterator, all iterators would contain the redundant information.

I think int* is plenty lean. I don't know what you mean by "lean iterator," actually. You can store the common information in the iterator itself, or you'll have pointers or references to the common information, but either way there will be redundant bits in a pair of filter_iterators.

...

When stacking N difference_ranges, the size difference between "fat" and "lean" iterators is 2^N. Thus in fully generic code where you don't know anything about the stacking depth, even generating a single fat iterator carries a potentially exponential penalty.

You've lost me. Maybe you'd better spell out what you mean by "fat" and "lean," and by "generating a fat iterator."

...

This fact makes me think that range is not merely a fancy name for a pair of iterators but a concept in its own right between containers and iterators. Generic algorithms must be written on the basis of ranges rather than iterators or take a significant performance hit.

Have you actually measured anything yet? I think you're being at least very hasty. If you do something that slows down operation on pairs of pointers, you won't be doing anyone favors. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

10:50 a.m.

On Mon, Sep 1, 2008 at 5:24 AM, David Abrahams <dave@boostpro.com> wrote:

...

on Sun Aug 31 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...
...
...
How is the separation of common_data and unique_data different from a separation of ranges and iterators? If iterators of ranges can rely on their range to exist, this is where common data like functors, end iterators etc. can be stored.

...
Naturally the point is to store the common data once when you need to store more than one iterator to the same sequence -- in a range, for example.

...
If you're asking why I bother reconstituting the whole iterator out of its parts, instead of simply referring to the common_data stored in the range: an adapted iterator is a useful concept regardless of whether anyone is using a range abstraction.

You could provide an adapted_iterator and also an adapted_range.

My point is that the adapted_range would then have semantically equivalent iterators with a different representation (and an extra indirection), which seems like a waste.

hum, which indirection? <...>

...

...
Do you then still need a factored iterator?

You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

Hei, maybe this is all that we need! Let's have a metafunction that given an iterator type returns its preferred range type (the associated range). The default might be boost.range or even std::pair. The developer of an iterator adaptor, can specialize the metafunction so that the associate range is the optimum representation for an iterator pair. Do you think this would be enough? It seems too simple to me, so probably I'm missing something... -- gpd

Arno Schödl

11:56 a.m.

...

...
You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

...

Hei, maybe this is all that we need! Let's have a metafunction that given an iterator type returns its preferred range type (the associated range). The default might be boost.range or even std::pair. The developer of an iterator adaptor, can specialize the metafunction so that the associate range is the optimum representation for an iterator pair.

...

Do you think this would be enough? It seems too simple to me, so probably I'm missing something...

I wouldn't know what to do when my iterator_adaptor needs three iterators to the same range. I could store two efficiently as a range and retrieve them when needed by querying range::begin()/end(), but what do I do with the third? Dave's factored iterators go further in the same direction, basically dropping the requirement that the common data must be a range, it can be anything. Both proposals have the disadvantage that I need to wrap/unwrap the [common data (Dave)/range (Giovanni)] whenever I change one of the wrapped iterators. I'd rather store the common data explicitly, and have indirections inside the iterators that point to it. Correct me if I misunderstand one of the two proposals. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

1:31 p.m.

On Mon, Sep 1, 2008 at 1:56 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

...
...
You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

...
Hei, maybe this is all that we need! Let's have a metafunction that given an iterator type returns its preferred range type (the associated range). The default might be boost.range or even std::pair. The developer of an iterator adaptor, can specialize the metafunction so that the associate range is the optimum representation for an iterator pair.

...
Do you think this would be enough? It seems too simple to me, so probably I'm missing something...

I wouldn't know what to do when my iterator_adaptor needs three iterators to the >same range. I could store two efficiently as a range and retrieve them when needed >by querying range::begin()/end(), but what do I do with the third? Dave's factored terators go further in the same direction, basically dropping the requirement that the >common data must be a range, it can be anything.

You are right. It wouldn't work with three iterators. But three iterators are a range of ranges. Maybe segmented iterators (or, better, segmented ranges) can come to the rescue? Need to think about it. Could you give a practical example of an iterator adapter needing three iterators?

...

Both proposals have the disadvantage that I need to wrap/unwrap the [common data >(Dave)/range (Giovanni)] whenever I change one of the wrapped iterators. I'd rather >store the common data explicitly, and have indirections inside the iterators that point >to it.

You would still have to allow for storing the wrapped iterator in the wrapper if the wrapped iterator does not support refactoring. I do not think there is a way around it. Also, I have this (completely unfounded) feeling that an iterator that relies on accessing common data in the owning range might be harder to optimize. I think that ranges should be used for long term storage, so if the iterator size itself is not optimal shouldn't be a big problem. Storing all data in a range and using an indirection in the iterator would be beneficial when that data would make the iterator expensive to copy (for example the shared pointer in shared_container_iterator: you could move the shared pointer in a shared_container_range and put a plain pointer in the iterator itself). -- gpd

Arno Schödl

2:08 p.m.

New subject: [SPAM (Bayesian)] - Re: lifetime of ranges vs. iterators - Bayesian Filter detected spam

...

Need to think about it. Could you give a practical example of an iterator adapter needing three iterators?

I never needed them in practice. But I would forward this to Dave. He likes the complete generality of factored iterators, I think. I think it is a cleaner concept than saying the common data must be representable as a range, but I cannot really back this up with practical examples. The wrapping/unwrapping (see below) troubles me in either case.

...

You would still have to allow for storing the wrapped iterator in the wrapper if the wrapped iterator does not support refactoring. I do not think there is a way around it.

But in my proposal (actually, yours), all iterators would be "lean", and each iterator would have access to its range, all the way up the iterator/range stack. So each iterator would itself only wrap other lean iterators. Storage bloat for stack size N would only be O(N) for the indirections, rather than O(2^N).

...

Also, I have this (completely unfounded) feeling that an iterator that relies on accessing common data in the owning range might be harder to optimize.

...

I think that ranges should be used for long term storage, so if the iterator size itself is not optimal shouldn't be a big problem.

Without any kind of storage optimization like your preferred-range templates or Dave's factored-iterators, the storage bloat would be _factor_ 2^N, so ever handling completely unwrapped iterators just does not scale. So it comes down to: a) with storage optimization, at every iterator access, unwrap/wrap the iterator out of their storage: [Dave] void increment() { m_itBase=decompose( ++compose( m_itBase, m_commondata ) ).first; // second is common_data again } or [Giovanni] void increment() { m_rng=TRange( ++m_rng.begin(), m_rng.end() ); } or b) have the iterator carry an indirection to the outside data. Given that the iterators coming out of a) are decomposed recursively by the next layer of iterators (that is, ItBase::operator++ inside the expression does the same thing again, all the way up), I would think that a) is actually harder to optimize than b), and the effect of no optimization would be much more dramatic. -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

2:48 p.m.

New subject: [SPAM (Bayesian)] - Re: lifetime of ranges vs. iterators - Bayesian Filter detected spam

On Mon, Sep 1, 2008 at 4:08 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

...
Need to think about it. Could you give a practical example of an iterator adapter needing three iterators?

I never needed them in practice. But I would forward this to Dave. He likes the complete generality of factored iterators, I think. I think it is a cleaner concept than saying the common data must be representable as a range, but I cannot really back this up with practical examples. The wrapping/unwrapping (see below) troubles me in > either case.

Sure, full generality is always good, but if ranges cover 90% of the cases, maybe it is good enough.

...

...
You would still have to allow for storing the wrapped iterator in the wrapper if the wrapped iterator does not support refactoring. I do not think there is a way around it.

But in my proposal (actually, yours), all iterators would be "lean", and each iterator would have access to its range, all the way up the iterator/range stack. So each iterator would itself only wrap other lean iterators. Storage bloat for stack size N would only be O(N) for the indirections, rather than O(2^N).

You can't assume that all iterators are lean. In fact most won't. Unless you drop compatibility with existing iterators, which is a no-no. My proposal was aimed at making ranges as lean as possible, not necessarily iterators. Anyways, as long as all parts of the stack are 'lean' ranges, you should have only a 2x storage bloat when using iterators.

...

...
Also, I have this (completely unfounded) feeling that an iterator that relies on accessing common data in the owning range might be harder to optimize.

...
I think that ranges should be used for long term storage, so if the iterator size itself is not optimal shouldn't be a big problem.

Without any kind of storage optimization like your preferred-range templates or Dave's factored-iterators, the storage bloat would be _factor_ 2^N, so ever handling > completely unwrapped iterators just does not scale.

So it comes down to:

a) with storage optimization, at every iterator access, unwrap/wrap the iterator out of their storage:

[Dave] void increment() { m_itBase=decompose( ++compose( m_itBase, m_commondata ) ).first; // second is common_data again }

or

[Giovanni] void increment() { m_rng=TRange( ++m_rng.begin(), m_rng.end() ); }

(Make that '++' 'boost::next()'). They are pretty equivalent. I think that my idea requires less scafolding: you probably are going to make a specific wrapper range anyway, so it is a mater of specializing associated_range. On the other hand, once you have compose and decompose, you proably do not need a specific wrapper. So, I do not know. I would have to try. I do not think that the above looks that bad. In fact I think that compilers should be able to optimize that wrapping and unwrapping pretty well (even more with move semantics). Here is a more or less complete (forward) filter_iterator and associated range (given appropriate definitions of front and rest) [note: more or less pseudocode] template<class Iter, class F> struct filter_iterator : iterator_facade<blah,blah...> { filter_iterat(Iter b, iter e, F f ) : m_range(b,e), m_f(f) {} private: void increment() { while(!m_f(front(m_range)) m_range = rest(m_range); } reference dereference() const { // problem: front(x) is defined as *begin(x); // what if the reference returned by // begin(m_range) is invalidated when the // iterator goes out of scope? Yes, it // happens and I got bitten at least once. // We need a trait. return front(m_range); } bool equal(filter_iterator const&rhs) { return begin(rhs.m_range) == begin(m_range) && end(rhs.m_range) == end(m_range); } associated_range<Iter>::type m_range; F m_f; // use compressed pair here }; template<class R, class F> struct filter_range { typedef filter_iterator<range_iterator<R>::type, R> iterator; iterator begin() const{ return iterator(begin(m_range), end(m_range), m_f); } iterator end() const{ return iterator(end(m_range), end(m_range), m_f); } R m_range; F m_f; // use compressed pair }; // specialization template<class Iter, class F> class associated_range<filter_iterator<Iter, F> > { typedef filter_range<associater_range<Iter>::type, F> type; }; This should handle nested filter iterators (assuming no further reductions) or other iterators with specialized nested_ranges quite nicely.

...

or

b) have the iterator carry an indirection to the outside data.

Given that the iterators coming out of a) are decomposed recursively by the next layer of iterators (that is, ItBase::operator++ inside the expression does the same thing again, all the way up), I would think that a) is actually harder to optimize than b),

compilers are quite good at eliminating all compile time recursive functions, so I would say it doesn't matter.

...

and the effect of no optimization would be much more dramatic.

this is always the case with code with high abstraction (10x is what I usually see). -- gpd

Arno Schödl

6:28 p.m.

New subject: [SPAM (Bayesian)] - Re: lifetime of ranges vs.iterators - Bayesian Filter detected spam

...

Sure, full generality is always good, but if ranges cover 90% of the cases, maybe it is good enough.

I agree.

...

...
But in my proposal (actually, yours), all iterators would be "lean", and each iterator would have access to its range, all the way up the iterator/range stack. So each iterator would itself only wrap other lean iterators. Storage bloat for stack size N would only be O(N) for the indirections, rather than O(2^N).

...

You can't assume that all iterators are lean. In fact most won't. Unless you drop compatibility with existing iterators, which is a no-no.

Old iterators will have neither the indirection I propose, nor will they offer a non-trivial factorization nor internally store preferred ranges. So any of the schemes discussed make no difference to old iterators. But old iterators are often trivial, so the whole problem should not really apply to them.

...

My proposal was aimed at making ranges as lean as possible, not necessarily iterators. Anyways, as long as all parts of the stack are 'lean' ranges, you should have only a 2x storage bloat when using iterators.

The bloat is potentially much worse with your implementation below:

...

iterator begin() const{ return iterator(begin(m_range), end(m_range), m_f); }

This triggers recursive unwrapping all the way to the top, causing factor 2^N extra storage to be generated, which is then imploded again inside the iterator ctors. In my other post to Dave, I calculate it in detail. This exponential extra storage is the reason I am making all this fuzz. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

5:01 p.m.

on Mon Sep 01 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

Need to think about it. Could you give a practical example of an iterator adapter needing three iterators?

E.g., a bidirectional filtered_iterator needs to be prevented from stepping off both ends of the underlying data to avoid undefined behavior. On the other hand, you shouldn't have to pay for those endpoint comparisons all the time, which tells me that something may be wrong with the design. Iterators are supposed to be lightweight things. If we're seriously worried about the cost of building iterator adaptors, we might consider the idea that there's something wrong with the bigger picture. Maybe these endpoint comparisons need to be dealt with differently, somehow. The situation with deque iterators that led to segmented iterators is very similar, and it resulted in a system of abstractions that allows one to operate, once again, close to the machine model. I wouldn't know how to address this issue without some real-life use cases, though. This has all become a bit too abstract a discussion for me, and Generic Programming is supposed to be driven by concrete examples. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

5:53 p.m.

...

I wouldn't know how to address this issue without some real-life use cases, though. This has all become a bit too abstract a discussion for me, and Generic Programming is supposed to be driven by concrete examples.

Isn't my problem concrete enough? I need a stackable difference_range without exponential storage bloat. How should I implement it? I am not sure that you appreciate the problem: Let's consider two implementations of a forward difference_iterator. A) Implemented with iterators alone, it contains 4 iterators: - the iterator to the range being traversed (mutable) - the iterator to the range being removed (mutable) - the end iterator to the range being traversed (constant, common) - the end iterator to the range being removed (constant, common) Total size: 4*sizeof(ptr) B) Implemented with iterators + indirection to common data: - the iterator to the range being traversed (mutable) - the iterator to the range being removed (mutable) - a reference to the range Total size: 3*sizeof(ptr) Now we stack these things, creating difference_range( difference_range, difference_range ) and so on. The iterators of the outermost difference_range then have these sizes: A) 1st Level: 4*sizeof(ptr) 2nd Level: 4*(4*sizeof(ptr)) Nth Level: (4^N)*sizeof(ptr) B) 1st Level: 3*sizeof(ptr) 2nd Level: 2*(3*sizeof(ptr))+1*sizeof(ptr)=7*sizeof(ptr) 3rd Level: 2*(7*sizeof(ptr))+1*sizeof(ptr)=15*sizeof(ptr) 4th Level: 2*(15*sizeof(ptr))+1*sizeof(ptr)=31*sizeof(ptr) Nth Level: (2^(N+1)-1)*sizeof(ptr) Extra factor of storage required by A over B: 4^N / (2^(N+1)-1) >= 4^N / (2^(N+1)) = 2^N / 2. So any implementation that even temporarily expands iterators to their original size, like Giovanni's implementation, will need vastly more storage. IMO, relying on the compiler to fix this is hazardous. Non-optimized code should be a constant factor slower than optimized code, and need a constant factor more memory, not exponentially slower and exponentially more memory.

...

I think you're being at least very hasty. If you do something that slows down operation on pairs of pointers, you won't be doing anyone favors.

My proposal is not slowing down operations on pairs of pointers at all. All I am asking for is that range adaptors guarantee that their input ranges stay valid, e.g., are stored inside the range adaptor. Naive iterators are stored as they are passed in, for example as iterator_range, so storage and access is as efficient as ever. Iterators don't _have_ to contain indirections, they merely _can_ contain indirections because I want to guarantee the existence of the underlying range. Current implementations in RangeEx don't guarantee that. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Steven Watanabe

6:32 p.m.

AMDG Arno Schödl wrote:

...

So any implementation that even temporarily expands iterators to their original size, like Giovanni's implementation, will need vastly more storage. IMO, relying on the compiler to fix this is hazardous. Non-optimized code should be a constant factor slower than optimized code, and need a constant factor more memory, not exponentially slower and exponentially more memory.

If I understand Giovanni's implementation correctly, it only needs to expand one layer at a time. In Christ, Steven Watanabe

Arno Schödl

6:55 p.m.

...

If I understand Giovanni's implementation correctly, it only needs to expand one layer at a time.

Let's expand it: iterator begin() const{ return iterator(begin(m_range), end(m_range), m_f); } --> return iterator( iterator( begin(subrange), end(subrange), m_f ), iterator( end(subrange), end(subrange), m_f ), m_f); --> ... The expansion will terminate when begin/end(sub...subrange) is the base iterator. Then the call is made to the filter_iterator ctor, which calls the associated_range<filter_iterator> ctor, which condenses everything back down. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

7:35 p.m.

On Mon, Sep 1, 2008 at 8:55 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

...
If I understand Giovanni's implementation correctly, it only needs to expand one layer at a time.

Let's expand it:

iterator begin() const{ return iterator(begin(m_range), end(m_range), m_f); }

-->

return iterator( iterator( begin(subrange), end(subrange), m_f ), iterator( end(subrange), end(subrange), m_f ), m_f);

-->

...

The expansion will terminate when begin/end(sub...subrange) is the base iterator. Then the call is made to the filter_iterator ctor, which calls the associated_range<filter_iterator> ctor, which condenses everything back down.

I see the problem. I think that for non complex iterators, the optimizer can still manage to fix the problem. For more complex iterators (think of an any_iterator, whose constructor and copy operator happen to be fairly expensive), things get bad. A general fix is having the constructor of the filter iterator take a range parameter instead of two iterators, and use that range to directly initialize its containing range (note that the two ranges do not need to be of the same type). That should take care of the doubly recursive call in begin. It doesn't fix end though, and we cannot pass a range because we obviously do not have one. We need another constructor that only takes a single iterator and initializes both ends of the subrange with that iterator. You would still have a (tail and static) recursive call to end(), but it would be O(N), which shouldn't be a problem. NRVO should take care of eliminating excessive copying. Arno, can you see other exponential expansions? [btw, for long chains of filter and map iterators it is not really a problem because you can always transform them exactly to one each of map and filter iterator] -- gpd

Arno Schödl

9:35 p.m.

I fixed filter_range::begin() and filter_range::end() as you said. iterator::equal was still exponential, but can be fixed by only comparing begin(). I think the exponential cases are now gone. Most things are still quadratic in the stack size, though... end() for example constructs a parameter of size N-1, which to be constructed, must construct a parameter of size N-2 and so on. NRVO does not help, as far as I understand it. You would need "upward" NRVO, constructing parameters right into the respective members. Instead of passing a pointer to the function where it wants the return value to be constructed (NRVO), the caller would need to get a pointer from the callee where the callee wants its parameters constructed. I don't think compilers do that, at least not reliably enough to build a whole paradigm on it:-(

...

[btw, for long chains of filter and map iterators it is not really a problem because you can always transform them exactly to one each of map and filter iterator]

That's why the problem occurred to me looking at difference_iterator. ------ struct both_end {}; template<class Iter, class F> struct filter_iterator : iterator_facade<blah,blah...> { filter_iterator(associated_range<Iter>::type const& rng, F f ) : m_range(rng), m_f(f) {} filter_iterator(iter e, F f, both_end ) : m_range(e,e), m_f(f) {} // if m_range takes e by value, e is expanded twice here, but no more, so no problem private: void increment() { while(!m_f(front(m_range)) m_range = rest(m_range); // = associated_range<Iter>::type( next( m_range.begin() ), m_range.end() ) // o.k., only recurses on m_range.begin() } reference dereference() const { // problem: front(x) is defined as *begin(x); // what if the reference returned by // begin(m_range) is invalidated when the // iterator goes out of scope? Yes, it // happens and I got bitten at least once. // We need a trait. return front(m_range); } bool equal(filter_iterator const&rhs) { return begin(rhs.m_range) == begin(m_range); // BAD: && end(rhs.m_range) == end(m_range); } associated_range<Iter>::type m_range; F m_f; // use compressed pair here }; template<class R, class F> struct filter_range { typedef filter_iterator<range_iterator<R>::type, F> iterator; iterator begin() const{ return iterator(m_range, m_f); } iterator end() const{ return iterator(end(m_range), m_f, both_end() ); } R m_range; F m_f; // use compressed pair }; -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

11:19 p.m.

On Mon, Sep 1, 2008 at 11:35 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

I fixed filter_range::begin() and filter_range::end() as you said. iterator::equal was still exponential, but can be fixed by only comparing begin(). I think the exponential cases are now gone.

it should be ok to compare only begin. If the two iterators have different ends, probably things would be a "little" messed up anyway.

...

Most things are still quadratic in the stack size, though... end() for example constructs a parameter of size N-1, which to be constructed, must construct a parameter of size N-2 and so on. NRVO does not help, as far as I understand it. You would need "upward" NRVO, constructing parameters right into the respective members. Instead of passing a pointer to the > function where it wants the return value to be constructed (NRVO), the caller would need to get a pointer from the callee where the callee wants its parameters constructed. I don't think > compilers do that, at least not reliably enough to build a whole paradigm on it:-(

Hum, I've played a little with some mockup iterators, and I can definitely see the N^2 stack growth, and with some expression template heavy program, I wouldn't be surprised if the growth would actually become noticeable (even if it weren't enough to blow the stack, it could still produce cache misses and such). I can't seem to come-up with any clever scheme to reduce the growth (short of using pointers to the original range of course). Enough for today. Maybe tomorrow I'll get some idea. -- gpd

Steven Watanabe

2 Sep 2 Sep

3:34 a.m.

AMDG Giovanni Piero Deretta wrote:

...

Hum, I've played a little with some mockup iterators, and I can definitely see the N^2 stack growth, and with some expression template heavy program, I wouldn't be surprised if the growth would actually become noticeable (even if it weren't enough to blow the stack, it could still produce cache misses and such). I can't seem to come-up with any clever scheme to reduce the growth (short of using pointers to the original range of course).

Enough for today. Maybe tomorrow I'll get some idea.

I've been playing around with this a bit. See attached. In Christ, Steven Watanabe

Arno Schödl

10:25 a.m.

New subject: [SPAM (Bayesian)] - Re: lifetime of ranges vs. iterators - Bayesian Filter detected spam

...

I've been playing around with this a bit. See attached.

...

In Christ, Steven Watanabe

iterator begin() { return iterator(boost::begin(m_range), boost::end(m_range), m_f); } iterator end() { return iterator(boost::end(m_range), boost::end(m_range), m_f); } This would blow up, I believe. It gets expanded on both begin and end, causing exponential storage growth. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Steven Watanabe

2:55 p.m.

AMDG Arno Schödl wrote:

...

This would blow up, I believe. It gets expanded on both begin and end, causing exponential storage growth.

Ok. Right. Incidentally, the storage required is not exponential because it is not all needed at once, but I see your point. I like the idea of having the filter_iterator contain a copy of the range. In Christ, Steven Watanabe

Arno Schödl

3:02 p.m.

New subject: [SPAM (Bayesian)] - Re: lifetime of ranges vs. iterators - Bayesian Filter detected spam

...

...
This would blow up, I believe. It gets expanded on both begin and end, causing exponential storage growth.

...

Ok. Right. Incidentally, the storage required is not exponential because it is not all needed at once, but I see your point.

You are right, running time is exponential, but storage is not. -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

10:27 a.m.

On Tue, Sep 2, 2008 at 5:34 AM, Steven Watanabe <watanabesj@gmail.com> wrote:

...

AMDG

Giovanni Piero Deretta wrote:

...
Hum, I've played a little with some mockup iterators, and I can definitely see the N^2 stack growth, and with some expression template heavy program, I wouldn't be surprised if the growth would actually become noticeable (even if it weren't enough to blow the stack, it could still produce cache misses and such). I can't seem to come-up with any clever scheme to reduce the growth (short of using pointers to the original range of course).

Enough for today. Maybe tomorrow I'll get some idea.

I've been playing around with this a bit. See attached.

I can't compile it because I lack a recent phoenix. What numbers are you getting? Also, isn't end() still O(N^2) on the stacking depth? -- gpd

Arno Schödl

6:47 a.m.

...

Enough for today. Maybe tomorrow I'll get some idea.

Let me try: First of all, associated_range has the semantics of "[efficiently (who wouldn't want that?)] holds pair of iterators of certain type". This is the same semantics as iterator_range. The existing iterator_ranges are the default implementation of associated_range, and the associate_ranges we want to implement are specialized iterator_ranges for certain types of iterators. The two types are the same. So let's use this iterator_range throughout, since we already have it. To eliminate quadratic running times, we must eliminate any linear-space expansion of function parameters. To do this, let's introduce some helper functions that make an iterator_range look like an iterator. Since we just discovered the dual life of iterator_range as associated_range, I will add them to iterator_range: template< class I > class iterator_range { ...other iterator_range stuff... // regular copy ctor for begin iterator //////////////////////////////////// // associated_range interface // end iterator ctor iterator_range( iterator_range const& range, create_end_tag ) : m_begin( range.m_end ), m_end( range.m_end ) {} // ctor from pair of ranges iterator_range( iterator_range const& b, iterator_range const& e ) : m_begin( b.m_begin ), m_end( e.m_begin ) {} // the names are chosen to reflect the function in iterators // they are meaningless when interpreted in range terms void increment() { // could be called "transform_to_rest" or simply "rest"? ++m_begin; } // TODO: check traversal_category whether this is allowed // Not needed by the forward filter_iterator below, but // would be used by bidirectional filter_iterator. void decrement() { // but what do we call this then? --m_begin; } // The three following helpers are optional. // [actually I am not so sure, I have not tried to eliminate them] // They could be implented via // begin/end, but begin/end is still O(N) because it needs to copy the whole stack. // If the compiler can optimize away trivial forwards, these functions // are O(1): // TODO: do we have to do anything about constness? reference dereference() const { // could be called "front" return *m_begin; } bool equal( iterator_range const& range ) const { return m_begin==range.m_begin; } bool at_end() const { return m_begin==m_end; } // TODO: advance, distance_to if traversal_category. // I first thought we never need them, but my company has a concat_range // that probably needs them. Would have to look in detail... }; Now we implement iterator_range on filter_iterator: template< class I, class F > class filter_iterator<I,F>; template< class I, class F > class iterator_range< filter_iterator<I,F> > { private: iterator_range<I> m_range; F m_f; void skip_filter() { while( !m_range.at_end() && !m_f( m_range.dereference() ) ) { m_range.increment(); } } public: //////////////////////////////////// // associated_range interface // end iterator ctor iterator_range( iterator_range const& range, create_end_tag ) : m_range( range.m_range, create_end_tag() ), m_f( range.m_f ) {} // ctor from pair of ranges iterator_range(iterator_range<I,F> const& b, iterator_range<I,F> const& e ) : m_range( b.m_range, e.m_range ), m_f( b.m_f ) {} // the names are chosen to reflect the function in iterators // they are meaningless when interpreted in range terms void increment() { m_range.increment(); skip_filter(); } // TODO: decrement to be bidirectional // TODO: do we have to do anything about constness? reference dereference() const { return m_range.dereference(); } bool equal( iterator_range const& range ) const { return m_range.equal( range.m_range ); } ////// end of associated_range interface ///// // ctor from underlying range iterator_range(iterator_range<I> const& range, F f ) : m_range( range.m_range ), m_f( f ) { skip_filter(); } // ctor from underlying iterators iterator_range(I const& b, I const& e, F const& f ) : m_range( b, e ), m_f( f ) { skip_filter(); } // ctor from own iterators iterator_range(filter_iterator<I,F> const& b, filter_iterator<I,F> const& e ) : m_range( b.m_range.m_range, e.m_range.m_range ), m_f( b.m_range.m_f ) {} // copy ctor iterator_range( iterator_range const& range ) : m_range( range.m_range ), m_f( range.m_f ) {} filter_iterator<I,F> begin() const { return filter_iterator<I,F>( *this ); } filter_iterator<I,F> end() const { return filter_iterator<I,F>( *this, create_end_tag ); } }; The filter_iterator is reduced to a mere shell around the range: template< class I, class F > class filter_iterator<I,F> { typedef iterator_range< filter_iterator<I,F> > myrange_t; myrange_t m_range; // copy ctor trivial // ctor from range filter_iterator( myrange_t const& range ) : m_range( range ) {} // ctor from end range filter_iterator( myrange_t const& range, create_end_tag ) : m_range( range, create_end_tag() ) {} void increment() { m_range.increment(); } reference dereference() const { return m_range.dereference(); } bool equal( filter_iterator const& it ) const { return m_range.equal( it.m_range ); } }; If you are uncomfortable with the design because you don't see the connection to what you [Giovanni] proposed earlier: I believe that I really made little active choices in the design. I merely eliminated long parameter lists that give quadratic running times by pushing the parameters through the stack, through all the other functors, all the way into the base case. -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

11:40 a.m.

On Tue, Sep 2, 2008 at 8:47 AM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

...
Enough for today. Maybe tomorrow I'll get some idea.

Let me try:

First of all, associated_range has the semantics of "[efficiently (who wouldn't want that?)] holds pair of iterators of certain type". This is the same semantics as iterator_range. The existing iterator_ranges are the default implementation of associated_range, and the associate_ranges we want to implement are specialized iterator_ranges for certain types of iterators. The two types are the same. So let's use this iterator_range throughout, since we already have it.

The reason to use a trait it to make it easier to adapt existing ranges. Also, containers are ranges, but do not have your additional functions, so you need to explicitly wrap them in iterator_ranges before wrapping them in another adapter range. With associated_range, the wrapping comes implicitly for free.

...

To eliminate quadratic running times, we must eliminate any linear-space expansion of function parameters. To do this, let's introduce some helper functions that make an iterator_range look like an iterator. Since we just discovered the dual life of iterator_range as associated_range, I will add them to iterator_range:

I'll add some comments inline in the code.

...

template< class I > class iterator_range { ...other iterator_range stuff...

// regular copy ctor for begin iterator

//////////////////////////////////// // associated_range interface

// end iterator ctor iterator_range( iterator_range const& range, create_end_tag ) : m_begin( range.m_end ), m_end( range.m_end ) {}

// ctor from pair of ranges iterator_range( iterator_range const& b, iterator_range const& e ) : m_begin( b.m_begin ), m_end( e.m_begin ) {}

// the names are chosen to reflect the function in iterators // they are meaningless when interpreted in range terms void increment() { // could be called "transform_to_rest" or simply "rest"? ++m_begin; }

Increment is fine. Instead of advance you could have an overloaded increment that takes a size_t parameter for random access ranges (BTW, boost.range has increment_{begin|end}). I think that these functions should all be friend functions, with reasonable default. The reason is the usual one: adaptability of existing ranges.

...

// TODO: check traversal_category whether this is allowed // Not needed by the forward filter_iterator below, but // would be used by bidirectional filter_iterator. void decrement() { // but what do we call this then? --m_begin; }

// The three following helpers are optional. // [actually I am not so sure, I have not tried to eliminate them] // They could be implented via // begin/end, but begin/end is still O(N) because it needs to copy the whole stack. // If the compiler can optimize away trivial forwards, these functions // are O(1):

The fact that they are O(N) is not that important. The important thing is that the stack growth is not exponential.

...

// TODO: do we have to do anything about constness? reference dereference() const { // could be called "front" return *m_begin; }

good question. IMHO constness should be completely determined by the underlying container (if any).

...

bool equal( iterator_range const& range ) const { return m_begin==range.m_begin; }

bool at_end() const { return m_begin==m_end; }

what about naming this just 'empty'?

...

// TODO: advance, distance_to if traversal_category. // I first thought we never need them, but my company has a concat_range // that probably needs them. Would have to look in detail... };

Now we implement iterator_range on filter_iterator:

template< class I, class F > class filter_iterator<I,F>;

template< class I, class F > class iterator_range< filter_iterator<I,F> > { private: iterator_range<I> m_range; F m_f;

void skip_filter() { while( !m_range.at_end() && !m_f( m_range.dereference() ) ) { m_range.increment(); } }

public: //////////////////////////////////// // associated_range interface

// end iterator ctor iterator_range( iterator_range const& range, create_end_tag ) : m_range( range.m_range, create_end_tag() ), m_f( range.m_f ) {}

eh, here comes the problem. The assumption that underlying range supports create_end_tag is a strong one. You can do it in your scheme, because all ranges are iterator_ranges, but I do not think it works well in practice. One (ugly) way out is making support for this tag part of the range concept i.e. require that associated_range returns a range that supports it; this rules out std::pair as an associated range. Another solution is somehow detecting the support for create_end_tag. <snip>

...

The filter_iterator is reduced to a mere shell around the range:

template< class I, class F > class filter_iterator<I,F> { typedef iterator_range< filter_iterator<I,F> > myrange_t;

myrange_t m_range;

// copy ctor trivial

// ctor from range filter_iterator( myrange_t const& range ) : m_range( range ) {}

// ctor from end range filter_iterator( myrange_t const& range, create_end_tag ) : m_range( range, create_end_tag() ) {}

void increment() { m_range.increment(); }

reference dereference() const { return m_range.dereference(); }

bool equal( filter_iterator const& it ) const { return m_range.equal( it.m_range ); } };

If you are uncomfortable with the design because you don't see the connection to what you [Giovanni] proposed earlier: I believe that I really made little active choices in the design. I merely eliminated long parameter lists that give quadratic running times by pushing the parameters through the stack, through all the other functors, all the way into the base case.

No no, I see the connection and I sort-of-like the solution. But I think that having a trait plus customization functions found via ADL is better. Also, there must be a better (i.e. less ugly) way than using the create_end_tag. -- gpd

Arno Schödl

1:22 p.m.

...

...
First of all, associated_range has the semantics of "[efficiently (who wouldn't want that?)] holds pair of iterators of certain type". This is the same semantics as iterator_range. The existing iterator_ranges are the default implementation of associated_range, and the associate_ranges we want to implement are specialized iterator_ranges for certain types of iterators. The two types are the same. So let's use this iterator_range throughout, since we already have it.

...

The reason to use a trait it to make it easier to adapt existing ranges. Also, containers are ranges, but do not have your additional functions, so you need to explicitly wrap them in iterator_ranges before wrapping them in another adapter range. With associated_range, the wrapping comes implicitly for free.

We can also use a trait. Keep in mind though that from a client's standpoint, both classes are really equivalent. In fact, if you instantiate an iterator_range for one of the associated_range::iterators, you really want the associated_range because it is more efficient. Likewise, the corresponding sub_range should really derive from associated_range. In our company's library, sub_range is a trait so that sub_range( transform_range ).base() returns the base subrange. I see iterator_range along similar lines, with the semantics mattering more than the implementation. This issue is somewhat of a side-track, though.

...

I think that these functions should all be friend functions, with reasonable default. The reason is the usual one: adaptability of existing ranges.

Fine with me. I am not so familiar with the boost.range stuff.

...

...
// [actually I am not so sure, I have not tried to eliminate them] // They could be implented via // begin/end, but begin/end is still O(N) because it needs to copy the whole stack. // If the compiler can optimize away trivial forwards, these functions // are O(1):

...

The fact that they are O(N) is not that important. The important thing is that the stack growth is not exponential.

The current implementation runs its recursion on the range, which I find quite straightforward. So I wouldn't want to eliminate them. Renaming or replacing with free functions is fine.

...

good question. IMHO constness should be completely determined by the underlying container (if any).

I think you are right. I did not check what iterator_range does, it probably does the right thing.

...

...
bool at_end() const { return m_begin==m_end; }

...

what about naming this just 'empty'?

Shame on me:-)

...

eh, here comes the problem. The assumption that underlying range supports create_end_tag is a strong one. You can do it in your scheme, because all ranges are iterator_ranges, but I do not think it works well in practice.

The other ctor that takes two ranges is just as special. If we can assume NRVO, it would be pretty easy to replace them with a free factory function template. That way, all special iterator_range members I introduced could be turned into free functions. -Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

3:50 a.m.

on Mon Sep 01 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
I wouldn't know how to address this issue without some real-life use cases, though. This has all become a bit too abstract a discussion for me, and Generic Programming is supposed to be driven by concrete examples.

Isn't my problem concrete enough? I need a stackable difference_range without exponential storage bloat.

Yeah, but why? What algorithm are you trying to implement that leads you to think about stacking difference_range? I don't imagine that a stacked difference_range is a good way to do anything efficiently. For example, if you're trying to do compute something like A - B - C - D where A,B,C, and D are sorted ranges, and you're doing it without a heap, it *will* be needlessly inefficient. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

5:26 a.m.

...

...
Isn't my problem concrete enough? I need a stackable difference_range without exponential storage bloat.

...

Yeah, but why? What algorithm are you trying to implement that leads you to think about stacking difference_range? I don't imagine that a stacked difference_range is a good way to do anything efficiently.

...

For example, if you're trying to do compute something like

...

A - B - C - D

...

where A,B,C, and D are sorted ranges, and you're doing it without a heap, it *will* be needlessly inefficient.

The RangeEx documentation writes something like "what algorithms are to containers, adaptor ranges are to ranges". Whenever you are applying one (non-permutating) algorithm to a container, and then another and then another, you can stack range adaptors. It's like lazy evaluation of a Haskell list. Looking through our code for adaptor ranges, I find: unique_range union_range difference_range filter_range concat_range These ranges are all actually used in our program. All these ranges need an end iterator, and any stacking of any combination of these ranges leads to the problems we are discussing here. I don't find this impractical. -Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

3:02 p.m.

on Tue Sep 02 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

The RangeEx documentation writes something like "what algorithms are to containers, adaptor ranges are to ranges". Whenever you are applying one (non-permutating) algorithm to a container, and then another and then another, you can stack range adaptors. It's like lazy evaluation of a Haskell list.

Sheesh; give me a little credit, please. I know what range adaptors do.

...

Looking through our code for adaptor ranges, I find:

unique_range union_range difference_range filter_range concat_range

These ranges are all actually used in our program. All these ranges need an end iterator, and any stacking of any combination of these ranges leads to the problems we are discussing here. I don't find this impractical.

You're missing my point. Are you indeed making such stacks? If so, what are you doing with them? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

3:22 p.m.

On Tue, Sep 2, 2008 at 5:02 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Tue Sep 02 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...
The RangeEx documentation writes something like "what algorithms are to containers, adaptor ranges are to ranges". Whenever you are applying one (non-permutating) algorithm to a container, and then another and then another, you can stack range adaptors. It's like lazy evaluation of a Haskell list.

Sheesh; give me a little credit, please. I know what range adaptors do.

...
Looking through our code for adaptor ranges, I find:

unique_range union_range difference_range filter_range concat_range

These ranges are all actually used in our program. All these ranges need an end iterator, and any stacking of any combination of these ranges leads to the problems we are discussing here. I don't find this impractical.

You're missing my point. Are you indeed making such stacks? If so, what are you doing with them?

The deepest stack I have in my application is 4 adapters deep. The reasons I do not have deeper stack are: 1) compile time cost 2) lack of auto and decltype for easy factorization (need to 'reify' the ranges at function boundaries) 3) lack of time to make additional lazy wrappers for standard algorithm. I expect that with a C++0x compiler 1 [*] and 2 won't be a problem, while RangeEx should take care of 3. The usage is in a real world text processing heavy application. [*] mostly due to variadics templates and rvalue refs. -- gpd

David Abrahams

6:27 p.m.

on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

...
You're missing my point. Are you indeed making such stacks? If so, what are you doing with them?

The deepest stack I have in my application is 4 adapters deep. The reasons I do not have deeper stack are:

1) compile time cost 2) lack of auto and decltype for easy factorization (need to 'reify' the ranges at function boundaries) 3) lack of time to make additional lazy wrappers for standard algorithm.

I expect that with a C++0x compiler 1 [*] and 2 won't be a problem, while RangeEx should take care of 3.

The usage is in a real world text processing heavy application.

Great, but what are you doing with these stacks? Normally, the answer would be something like "I'm applying the X algorithm to a range of elements that represent Y but have been adapted to look like Z" -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

9:37 p.m.

On Tue, Sep 2, 2008 at 8:27 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
...
You're missing my point. Are you indeed making such stacks? If so, what are you doing with them?

The deepest stack I have in my application is 4 adapters deep. The reasons I do not have deeper stack are:

1) compile time cost 2) lack of auto and decltype for easy factorization (need to 'reify' the ranges at function boundaries) 3) lack of time to make additional lazy wrappers for standard algorithm.

I expect that with a C++0x compiler 1 [*] and 2 won't be a problem, while RangeEx should take care of 3.

The usage is in a real world text processing heavy application.

Great, but what are you doing with these stacks? Normally, the answer would be something like "I'm applying the X algorithm to a range of elements that represent Y but have been adapted to look like Z"

Usually is a matter of converting a text document to a point in a feature vector space as a preprocessing stage: Most pipelines start with tokenization, normalization, filtering and hashing, with a 'take' operation at the end. The resulting range is usually sorted (which implies breaking the laziness), unique-ed and augmented with score information (another map). I very often need to compute set union and intersection of pair of these ranges. [I do not have yet a lazy adaptor for this (actually I do, but is kind of experimental)]. Most of the uses of lazy ranges are in relatively non performance critical part of the application, so I do not aim for absolute zero abstraction overhead. A nice thing of lazy ranges is that are useful to control peak memory usage of the application (in fact if you are careful, you can do very little dynamic memory allocation) I'm interested in using dynamic iterators in the near future for code decoupling and It would be a pity if these ranges couldn't fit in a small object optimization buffer. -- gpd

David Abrahams

3 Sep 3 Sep

1:11 p.m.

on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

...
Great, but what are you doing with these stacks? Normally, the answer would be something like "I'm applying the X algorithm to a range of elements that represent Y but have been adapted to look like Z"

Usually is a matter of converting a text document to a point in a feature vector space as a preprocessing stage:

e.g., for the purposes of search? Just trying to get a feel for what you're describing.

...

Most pipelines start with tokenization, normalization, filtering and hashing, with a 'take' operation at the end.

By 'take' you mean a destructive read?

...

The resulting range is usually sorted (which implies breaking the laziness),

Yep.

...

unique-ed and augmented with score information (another map).

I very often need to compute set union and intersection of pair of these ranges. [I do not have yet a lazy adaptor for this (actually I do, but is kind of experimental)].

Okay.

...

Most of the uses of lazy ranges are in relatively non performance critical part of the application, so I do not aim for absolute zero abstraction overhead.

Okay, so... is it worth breaking up the iterator abstraction for this purpose?

...

A nice thing of lazy ranges is that are useful to control peak memory usage of the application (in fact if you are careful, you can do very little dynamic memory allocation)

Yes.

...

I'm interested in using dynamic iterators in the near future for code decoupling

You mean, like, any_iterator?

...

and It would be a pity if these ranges couldn't fit in a small object optimization buffer.

Do you know how big that target ("small object optimization buffer") is? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

3:50 p.m.

On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
...
Great, but what are you doing with these stacks? Normally, the answer would be something like "I'm applying the X algorithm to a range of elements that represent Y but have been adapted to look like Z"

Usually is a matter of converting a text document to a point in a feature vector space as a preprocessing stage:

e.g., for the purposes of search? Just trying to get a feel for what you're describing.

Near duplicate elimination and document clustering in general. Most clustering algorithms work on representations of documents as points in a very high dimension space.

...

...
Most pipelines start with tokenization, normalization, filtering and hashing, with a 'take' operation at the end.

By 'take' you mean a destructive read?

What do you mean with 'destructive'? 'take(range, n)' returns a range adaptor that iterates over the first 'n' elements of 'range'. This way I do not have to decide how much of the original document (which theoretically could be of unlimited length) I need to until the end of the pipeline (most importantly, I want to set the final length *after* filtering). <snip>

...

...
Most of the uses of lazy ranges are in relatively non performance critical part of the application, so I do not aim for absolute zero abstraction overhead.

Okay, so... is it worth breaking up the iterator abstraction for this purpose?

Do you think we are breaking the iterator abstraction? How? Or I've misunderstood the question? We are simply trying to come up with a simple and clean trick to keep iterator size under control.

...

...
A nice thing of lazy ranges is that are useful to control peak memory usage of the application (in fact if you are careful, you can do very little dynamic memory allocation)

Yes.

...
I'm interested in using dynamic iterators in the near future for code decoupling

You mean, like, any_iterator?

Yes.

...

...
and It would be a pity if these ranges couldn't fit in a small object optimization buffer.

Do you know how big that target ("small object optimization buffer") is?

I'm going to implement my own any_iterator, so I'll make the buffer big enough for the average iterator size. Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1]. [1] 100 bytes for SBO might seems a lot, but, I've used a custom string with a very large SBO buffer (over 100 bytes), been happily copying it around and performance was much better than with standard std::string (which, admittedly, did do reference counting but no SBO). I guess that the size of the buffer depends a lot on what you are going to do with it. -- gpd

David Abrahams

5:21 p.m.

on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

On Wed, Sep 3, 2008 at 3:11 PM, David Abrahams <dave@boostpro.com> wrote:

...
on Tue Sep 02 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
...
Great, but what are you doing with these stacks? Normally, the answer would be something like "I'm applying the X algorithm to a range of elements that represent Y but have been adapted to look like Z"

Usually is a matter of converting a text document to a point in a feature vector space as a preprocessing stage:

e.g., for the purposes of search? Just trying to get a feel for what you're describing.

Near duplicate elimination and document clustering in general. Most clustering algorithms work on representations of documents as points in a very high dimension space.

OK.

...

...
...
Most pipelines start with tokenization, normalization, filtering and hashing, with a 'take' operation at the end.

By 'take' you mean a destructive read?

What do you mean with 'destructive'?

Nevermind; it was a stab in the dark and you explain what you mean below.

...

'take(range, n)' returns a range adaptor that iterates over the first 'n' elements of 'range'. This way I do not have to decide how much of the original document (which theoretically could be of unlimited length) I need to until the end of the pipeline (most importantly, I want to set the final length *after* filtering).

OK.

...

<snip>

...
...
Most of the uses of lazy ranges are in relatively non performance critical part of the application, so I do not aim for absolute zero abstraction overhead.

Okay, so... is it worth breaking up the iterator abstraction for this purpose?

Do you think we are breaking the iterator abstraction? How? Or I've misunderstood the question?

Well, if we add "Factorable" or something like it, the iterator abstraction gets broken down into smaller bits for the purpose of storage, complicating some code [like a generic range for example ;-)] and it imposes an extra coding duty on writers of high-quality iterator adaptors. I'm asking if its worth it.

...

We are simply trying to come up with a simple and clean trick to keep iterator size under control.

Really? IIUC so far we're only discussing compressing the size of ranges, unless we're willing to pay for indirection inside of iterators (I'm not).

...

...
...
I'm interested in using dynamic iterators in the near future for code decoupling

You mean, like, any_iterator?

Yes.

...
...
and It would be a pity if these ranges couldn't fit in a small object optimization buffer.

Do you know how big that target ("small object optimization buffer") is?

I'm going to implement my own any_iterator, so I'll make the buffer big enough for the average iterator size.

Oh, that's not the small object optimization. The SOO is only done by compilers: it means passing small objects in registers instead of on the stack. http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/323704.... IIUC you're talking about the too-specifically-named "small string optimization." I don't understand why you would implement any_iterator yourself when there are so many extant implementations, but maybe it's none of my business.

...

Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1].

Again, have you measured?

...

[1] 100 bytes for SBO might seems a lot, but, I've used a custom string with a very large SBO buffer (over 100 bytes), been happily copying it around and performance was much better than with standard std::string (which, admittedly, did do reference counting but no SBO). I guess that the size of the buffer depends a lot on what you are going to do with it.

Yes indeed. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

5:42 p.m.

...

...
We are simply trying to come up with a simple and clean trick to keep iterator size under control.

...

Really? IIUC so far we're only discussing compressing the size of ranges, unless we're willing to pay for indirection inside of iterators (I'm not).

No, we also had minimum size iterators. By "minimum size" I mean that each iterator contains the begin and end base (e.g., all the way up the stack) iterators + all functors involved, stored by value. This is really equivalent to a single iterator + boost::bind. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

6:27 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
We are simply trying to come up with a simple and clean trick to keep iterator size under control.

...
Really? IIUC so far we're only discussing compressing the size of ranges, unless we're willing to pay for indirection inside of iterators (I'm not).

No, we also had minimum size iterators. By "minimum size" I mean that each iterator contains the begin and end base (e.g., all the way up the stack) iterators + all functors involved, stored by value. This is really equivalent to a single iterator + boost::bind.

Please be specific. Do you have some way to avoid my what happens in my description of stacking filter_iterator and strided_iterator? If so, what do you get and how is it done? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

6:39 p.m.

On Wed, Sep 3, 2008 at 7:21 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
Do you think we are breaking the iterator abstraction? How? Or I've misunderstood the question?

Well, if we add "Factorable" or something like it, the iterator abstraction gets broken down into smaller bits for the purpose of storage, complicating some code [like a generic range for example ;-)] and it imposes an extra coding duty on writers of high-quality iterator adaptors. I'm asking if its worth it.

...
We are simply trying to come up with a simple and clean trick to keep iterator size under control.

Really? IIUC so far we're only discussing compressing the size of ranges, unless we're willing to pay for indirection inside of iterators (I'm not).

But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

...

...
...
...
I'm interested in using dynamic iterators in the near future for code decoupling

You mean, like, any_iterator?

Yes.

...
...
and It would be a pity if these ranges couldn't fit in a small object optimization buffer.

Do you know how big that target ("small object optimization buffer") is?

I'm going to implement my own any_iterator, so I'll make the buffer big enough for the average iterator size.

Oh, that's not the small object optimization. The SOO is only done by compilers: it means passing small objects in registers instead of on the stack. http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/323704.... IIUC you're talking about the too-specifically-named "small string optimization."

Ah, ok, I also saw it begin called small buffer optimization. I confused object with buffer.

...

I don't understand why you would implement any_iterator yourself when there are so many extant implementations, but maybe it's none of my business.

Mostly for experimenting. I'll might end up using something third party in real code.

...

...
Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1].

Again, have you measured?

If you mean the size, yes, I've measured it. If you mean my assertion that 100 byte SBO may be optimal, then I wont' let facts interfere with my opinions :) Now, honestly, I'm trying to rationalize the fear that I had for a while that complex iterator adaptor sequences really can grow large. For example, the current filter_iterator double its size at every stacking. If you couple it with a relatively heavy predicate, the stack usage might start to really matter. On heavy threaded applications, stack usage is a somewhat limited resource. -- gpd

David Abrahams

11:36 p.m.

on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

On Wed, Sep 3, 2008 at 7:21 PM, David Abrahams <dave@boostpro.com> wrote:

...
on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
Do you think we are breaking the iterator abstraction? How? Or I've misunderstood the question?

Well, if we add "Factorable" or something like it, the iterator abstraction gets broken down into smaller bits for the purpose of storage, complicating some code [like a generic range for example ;-)] and it imposes an extra coding duty on writers of high-quality iterator adaptors. I'm asking if its worth it.

...
We are simply trying to come up with a simple and clean trick to keep iterator size under control.

Really? IIUC so far we're only discussing compressing the size of ranges, unless we're willing to pay for indirection inside of iterators (I'm not).

But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

Oh, I totally missed that; sorry! Let's see... strided_iterator<int*,5> stores range<int*> sizeof(range<int*>) == 2*sizeof(int*) filter_iterator<strided_iterator<int*,5> > stores range<strided_iterator<int*,5> > range<strided_iterator<int*,5> > is represented as unique_data<strided_iterator<int*,5> >::type start; unique_data<strided_iterator<int*,5> >::type finish; common_data<strided_iterator<int*,5> >::type common; unique_data<strided_iterator<int*,5> >::type == int* common_data<strided_iterator<int*,5> >::type == int* so its size is 3*sizeof(int*) ?? Did I get that right? If so, it seems like there's still redundant storage. finish and common will be representing essentially the same information. If I were to write a filter_strided_iterator adapter and apply it to int*, it would have the size of 2 int*s. a

...

...
I don't understand why you would implement any_iterator yourself when there are so many extant implementations, but maybe it's none of my business.

Mostly for experimenting. I'll might end up using something third party in real code.

...
...
Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1].

Again, have you measured?

If you mean the size, yes, I've measured it. If you mean my assertion that 100 byte SBO may be optimal, then I wont' let facts interfere with my opinions :)

I meant measured that a type-erased 100 byte iterator improves performance over a bald 400-byte iterator.

...

Now, honestly, I'm trying to rationalize the fear that I had for a while that complex iterator adaptor sequences really can grow large. For example, the current filter_iterator double its size at every stacking. If you couple it with a relatively heavy predicate, the stack usage might start to really matter. On heavy threaded applications, stack usage is a somewhat limited resource.

And I'm trying to say that the point of GP is to raise coding to the highest possible level of abstraction *without loss of efficiency*. If we can't achieve "relative efficiency" as described in http://www.stepanovpapers.com/BYTE_com.htm, I don't think we're doing the job right. So far, unless I'm mistaken, we're still missing the mark by a fairly wide margin. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

4 Sep 4 Sep

12:54 a.m.

On Thu, Sep 4, 2008 at 1:36 AM, David Abrahams <dave@boostpro.com> wrote:

...

on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
On Wed, Sep 3, 2008 at 7:21 PM, David Abrahams <dave@boostpro.com> wrote:

...
on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
Do you think we are breaking the iterator abstraction? How? Or I've misunderstood the question?

Well, if we add "Factorable" or something like it, the iterator abstraction gets broken down into smaller bits for the purpose of storage, complicating some code [like a generic range for example ;-)] and it imposes an extra coding duty on writers of high-quality iterator adaptors. I'm asking if its worth it.

...
We are simply trying to come up with a simple and clean trick to keep iterator size under control.

Really? IIUC so far we're only discussing compressing the size of ranges, unless we're willing to pay for indirection inside of iterators (I'm not).

But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

Oh, I totally missed that; sorry! Let's see...

strided_iterator<int*,5> stores range<int*>

sizeof(range<int*>) == 2*sizeof(int*)

filter_iterator<strided_iterator<int*,5> > stores range<strided_iterator<int*,5> >

range<strided_iterator<int*,5> > is represented as

unique_data<strided_iterator<int*,5> >::type start; unique_data<strided_iterator<int*,5> >::type finish; common_data<strided_iterator<int*,5> >::type common;

unique_data<strided_iterator<int*,5> >::type == int* common_data<strided_iterator<int*,5> >::type == int*

so its size is 3*sizeof(int*)

?? Did I get that right?

If so, it seems like there's still redundant storage. finish and common will be representing essentially the same information. If I were to write a filter_strided_iterator adapter and apply it to int*, it would have the size of 2 int*s. a

What information is contained in common that is not also contained in finish (i've never implemented a strided iterator, I'm trying to understand) ? And why you can do without it in filter_strided_iterator?

...

...
...
I don't understand why you would implement any_iterator yourself when there are so many extant implementations, but maybe it's none of my business.

Mostly for experimenting. I'll might end up using something third party in real code.

...
...
Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1].

Again, have you measured?

If you mean the size, yes, I've measured it. If you mean my assertion that 100 byte SBO may be optimal, then I wont' let facts interfere with my opinions :)

I meant measured that a type-erased 100 byte iterator improves performance over a bald 400-byte iterator.

Eh? I never claimed it did! Barring heavy optimizations of indirect call, an any_iterator will always be much slower than a plain iterator. Any_iterator is not about performance, but is useful as a compilation firewalls and for ease of use. You want to make the buffer as small as possible, so the smaller your iterators are, the more likely they are to fit in the buffer.

...

...
Now, honestly, I'm trying to rationalize the fear that I had for a while that complex iterator adaptor sequences really can grow large. For example, the current filter_iterator double its size at every stacking. If you couple it with a relatively heavy predicate, the stack usage might start to really matter. On heavy threaded applications, stack usage is a somewhat limited resource.

And I'm trying to say that the point of GP is to raise coding to the highest possible level of abstraction *without loss of efficiency*. If we can't achieve "relative efficiency" as described in http://www.stepanovpapers.com/BYTE_com.htm, I don't think we're doing the job right. So far, unless I'm mistaken, we're still missing the mark by a fairly wide margin.

Hum, I've entered the discussion with the with the explicit aim to reduce the size of iterators adapters and ranges while trying not to make them slower than they are right now. Making them faster was not the aim. (for the record, I think too that the exception approach is a bit of a dead end). I think your point is: do not bother with size expansion, because when the size will be big enough to be a problem, performance will already have reached the floor. My counterpoint is that every layer add more or less a constant overhead, while the size expansion grows quadratically, also, in situations were performance is not a problem, you might still care about the size. -- gpd

David Abrahams

2:08 a.m.

on Wed Sep 03 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

...
...
But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

Oh, I totally missed that; sorry! Let's see...

strided_iterator<int*,5> stores range<int*>

sizeof(range<int*>) == 2*sizeof(int*)

filter_iterator<strided_iterator<int*,5> > stores range<strided_iterator<int*,5> >

range<strided_iterator<int*,5> > is represented as

unique_data<strided_iterator<int*,5> >::type start; unique_data<strided_iterator<int*,5> >::type finish; common_data<strided_iterator<int*,5> >::type common;

unique_data<strided_iterator<int*,5> >::type == int* common_data<strided_iterator<int*,5> >::type == int*

so its size is 3*sizeof(int*)

?? Did I get that right?

If so, it seems like there's still redundant storage. finish and common will be representing essentially the same information. If I were to write a filter_strided_iterator adapter and apply it to int*, it would have the size of 2 int*s. a

What information is contained in common that is not also contained in finish (i've never implemented a strided iterator, I'm trying to understand) ?

My point is that there is essentially the same information in common and finish, but that you can't eliminate it because the abstractions don't allow it. A strided_iterator implementation and test are attached.

...

And why you can do without it in filter_strided_iterator?

Well, let me try to write that one, and we'll see. OK, done and attached. Please keep in mind: 1. This was an unbelievably hasty job; surely I made a mistake somewhere. 2. No effort is made to optimize away storage for the predicate.

...

...
...
...
...
Currently the iterators I use are often over 400 bytes, which is a bit to big :), thus the need to squeeze their size down as much as possible. A hundred bytes could be enough [1].

Again, have you measured?

If you mean the size, yes, I've measured it. If you mean my assertion that 100 byte SBO may be optimal, then I wont' let facts interfere with my opinions :)

I meant measured that a type-erased 100 byte iterator improves performance over a bald 400-byte iterator.

Eh? I never claimed it did! Barring heavy optimizations of indirect call, an any_iterator will always be much slower than a plain iterator. Any_iterator is not about performance, but is useful as a compilation firewalls and for ease of use.

You want to make the buffer as small as possible, so the smaller your iterators are, the more likely they are to fit in the buffer.

Sure, I agree.

...

...
...
Now, honestly, I'm trying to rationalize the fear that I had for a while that complex iterator adaptor sequences really can grow large. For example, the current filter_iterator double its size at every stacking. If you couple it with a relatively heavy predicate, the stack usage might start to really matter. On heavy threaded applications, stack usage is a somewhat limited resource.

And I'm trying to say that the point of GP is to raise coding to the highest possible level of abstraction *without loss of efficiency*. If we can't achieve "relative efficiency" as described in http://www.stepanovpapers.com/BYTE_com.htm, I don't think we're doing the job right. So far, unless I'm mistaken, we're still missing the mark by a fairly wide margin.

Hum, I've entered the discussion with the with the explicit aim to reduce the size of iterators adapters and ranges while trying not to make them slower than they are right now. Making them faster was not the aim.

Space efficiency counts as much as speed efficiency. I don't think we're really achieving either.

...

(for the record, I think too that the exception approach is a bit of a dead end).

I think your point is: do not bother with size expansion, because when the size will be big enough to be a problem, performance will already have reached the floor.

More like, "the overall performance cost associated with the wasted space may well be swamped by other factors." Although I'm not *really* saying "don't bother." I'm saying on the one hand, it doesn't look like we have any evidence that a real-world app will perform better because of these efforts on the other hand, if you *are* going to optimize space, don't aim so low. If the optimization is going to matter, it's going to matter that you did it right.

...

My counterpoint is that every layer add more or less a constant overhead, while the size expansion grows quadratically, also, in situations were performance is not a problem, you might still care about the size.

You may have a point. It's just that the characteristic of this growth is not one that would normally cause the program to blow up in the field because the customer's problem size is larger than anything you tested: even though it's O(N^2), there's nothing the customer can do to increase N. Furthermore, one doesn't tend to create vast numbers of iterator instances. So it doesn't seem so urgent to optimize it without some data showing it will make an important difference. ------- ------- -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

5 Sep 5 Sep

4:51 a.m.

2008/9/4 David Abrahams <dave@boostpro.com>:

...

...
...
...
But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

Oh, I totally missed that; sorry! Let's see...

strided_iterator<int*,5> stores range<int*>

sizeof(range<int*>) == 2*sizeof(int*)

filter_iterator<strided_iterator<int*,5> > stores range<strided_iterator<int*,5> >

range<strided_iterator<int*,5> > is represented as

unique_data<strided_iterator<int*,5> >::type start; unique_data<strided_iterator<int*,5> >::type finish; common_data<strided_iterator<int*,5> >::type common;

unique_data<strided_iterator<int*,5> >::type == int* common_data<strided_iterator<int*,5> >::type == int*

so its size is 3*sizeof(int*)

?? Did I get that right?

If so, it seems like there's still redundant storage. finish and common will be representing essentially the same information. If I were to write a filter_strided_iterator adapter and apply it to int*, it would have the size of 2 int*s. a

What information is contained in common that is not also contained in finish (i've never implemented a strided iterator, I'm trying to understand) ?

My point is that there is essentially the same information in common and finish, but that you can't eliminate it because the abstractions don't allow it. A strided_iterator implementation and test are attached.

Hum, I have been thinking about this for a while, and the more I think about it, the more it seems to me that that strided_iterator implementation is not optimal. Again, I've never needed one, so I'm not sure about the use case. I'll try to reconstruct the problems from the hints you left in this thread: Basically a strided iterator is an adaptor that visit every Nth element in a range, where N is a compile time constant. From your implementation I gather that the underlying range must be a random access range. A straight forward implementation is a thin wrapper around a random access iterator, you just replace operators ++ and -- with +=N (with similar changes for operators + and -). The problem is that you want to visit ranges whose size is not an even multiple of N, the above scheme wouldn't work, because before comparison with endpoint, you would have already moved the iterator beyond that end point. If the endpoint is one past the end of the base range, you have UB. The solution used in your implementation is to actually store the unstrided endpoint and position the internal iterator at that end point if the next increment would go beyond the boundary. This requires a conditional branch in increment, and makes the semantics of the iterator a bit surprising: you have to know the boundaries of the iteration when you construct the begin iterator and two iterators over the same underlying sequence with same stride and phase, but constructed with different end positions, will actually be iterating with diffent ranges (one iterator can't reach the end position of the other iterator). Now, the obvious solution is not to store the end position but instead to use a base+offset representation which are only combined together at dereference time. The member functions become: void incement() { //similar thing for decrement offset += N; } void advance(distance_type n) { offset += n * N; } reference dereference() const { return base[offset]; } distance_type distance_to(filter_iterator rhs) { return ((base - rhs.base) + (offset - rhs.offset)) / N ; // I might have got the sign wrong } [ in fact, searching the archives, it seems you are aware of this implementation, so I guess there is some problems I'm missing if you didn't use it] Assuming sizeof(size_t) == sizeof(void*), the size of this iterator is the same as the size of your implementation. Now a base and range size are the only informations you need to reconstruct begin and end begin.base = range.base; begin.offset= 0; end.base = range.base; end.offset = range.lenght; Thus sizeof(strided_range) can be sizeof(void*)*2 which is optimal. This doesn't of course prove that storing a range would lead to optimal representation for all iterator adaptors, but at least is enough for guaranteeing an optimal (in term of space) filter_iterator<strided_iterator>. Am I missing something? -- gpd

Arno Schödl

11:38 a.m.

...

void incement() { //similar thing for decrement offset += N; }

...

void advance(distance_type n) { offset += n * N; }

...

reference dereference() const { return base[offset]; }

...

distance_type distance_to(filter_iterator rhs) { return ((base - rhs.base) + (offset - rhs.offset)) / N ; // I might have got the sign wrong }

In the case of a base forward_iterator, we can do without any space overhead with the same trick used for the filter_iterator. We implement a strided_range and then derive the strided_iterator from it. The strided_range stores its underlying random_access range, and adapted_range interface::increment/advance can check the underlying range for empty. It would be like: increment() { for( difference_type i=0; i<N && !base.empty(); ++i ) { base.increment(); // the adapted_range::increment which increments begin, base is a range } } -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

1:39 p.m.

on Fri Sep 05 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

In the case of a base forward_iterator, we can do without any space overhead with the same trick used for the filter_iterator. We implement a strided_range and then derive the strided_iterator from it. The strided_range stores its underlying random_access range, and adapted_range interface::increment/advance can check the underlying range for empty. It would be like:

increment() { for( difference_type i=0; i<N && !base.empty(); ++i ) { base.increment(); // the adapted_range::increment which increments begin, base is a range } }

I don't think you're fully appreciating what a huge impact such a complex implementation of increment() would have on performance. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

6 Sep 6 Sep

1:52 p.m.

...

...
increment() { for( difference_type i=0; i<N && !base.empty(); ++i ) { base.increment(); // the adapted_range::increment which increments begin, base is a range } }

...

I don't think you're fully appreciating what a huge impact such a complex implementation of increment() would have on performance.

Sorry, I meant this as the implementation for forward range. You don't want to use increment for random_access ranges, obviously. Thinking about it some more, I believe the associated_range abstraction is not quite the right one yet. It works for forward_ranges, which is why I did not notice earlier, but not for bidirectional and random_access_ranges. The correct abstration is bounded_iterator, which is an iterator + the knowledge of whatever bounds it needs. The extra functionality offered over iterators is to test overrun of begin/end in increment/decrement/advance. begin/end may have a bool return value whether increment/decrement succeeded or not, and advance may return the difference_type that actually got executed until begin/end was reached. The bounded_forward_iterator is a bit similar to a range. Its base iterator implementation must store the current iterator + the end. This is essentially what Giovanni introduced as associated_range. The bounded_bidrectional_iterator and bounded_random_access_iterator base implementation must store begin, current and end base iterators. All iterators needing the underlying iterator bounds can then run on top of bounded_iterators, and are implement the bounded_iterator interface themselves. I think this eliminates all space overhead of any adaptors we have been mentioning so far. In particular this applies to strided_iterators of all categories. The corresponding adaptor ranges would return bounded iterators from their begin()/end() functions. They store their base range internally instead of two iterator adaptors, to avoid any space overhead in ranges. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

11:32 p.m.

on Sat Sep 06 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
increment() { for( difference_type i=0; i<N && !base.empty(); ++i ) { base.increment(); // the adapted_range::increment which increments begin, base is a range } }

...
I don't think you're fully appreciating what a huge impact such a complex implementation of increment() would have on performance.

Sorry, I meant this as the implementation for forward range. You don't want to use increment for random_access ranges, obviously.

I don't see what the traversal category has to do with anything.

...

Thinking about it some more, I believe the associated_range abstraction is not quite the right one yet. It works for forward_ranges, which is why I did not notice earlier, but not for bidirectional and random_access_ranges.

The correct abstration is bounded_iterator, which is an iterator + the knowledge of whatever bounds it needs.

Only if you don't care about eliminating other redundancies. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

7 Sep 7 Sep

10:02 a.m.

...

...
...
increment() { for( difference_type i=0; i<N && !base.empty(); ++i ) { base.increment(); // the adapted_range::increment which increments begin, base is a range } }

...
I don't think you're fully appreciating what a huge impact such a complex implementation of increment() would have on performance.

Sorry, I meant this as the implementation for forward range. You don't want to use increment for random_access ranges, obviously.

...

I don't see what the traversal category has to do with anything.

A strided_iterator should use advance on random_access_ranges rather than increment.

...

Thinking about it some more, I believe the associated_range abstraction is not quite the right one yet. It works for forward_ranges, which is why I did not notice earlier, but not for bidirectional and random_access_ranges.

...
The correct abstration is bounded_iterator, which is an iterator + the knowledge of whatever bounds it needs.

...

Only if you don't care about eliminating other redundancies.

What are your referring to specificly? If you are referring to storing the predicate or stride length, this is taken care of because the iterator adaptor stack stores this information only once: template<I, F> struct bounded_iterator< filter_iterator<I,F> > { typedef filter_iterator<I,F> type; }; template<I, F> struct filter_iterator { typename bounded_iterator< I >::type m_base; F m_f; ... }; Ranges would store their base ranges rather than a pair of iterators, and generate iterators on demand, so they store predicate/stride_length/... only once as well. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

12:18 p.m.

on Sun Sep 07 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
Thinking about it some more, I believe the associated_range abstraction is not quite the right one yet. It works for forward_ranges, which is why I did not notice earlier, but not for bidirectional and random_access_ranges.

...
The correct abstration is bounded_iterator, which is an iterator + the knowledge of whatever bounds it needs.

...
Only if you don't care about eliminating other redundancies.

What are your referring to specificly? If you are referring to storing the predicate or stride length, this is taken care of because the iterator adaptor stack stores this information only once:

I'm talking about "vertical" redundancies between layers of the stack. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

3:02 p.m.

...

...
What are your referring to specificly? If you are referring to storing the predicate or stride length, this is taken care of because the iterator adaptor stack stores this information only once:

...

I'm talking about "vertical" redundancies between layers of the stack.

Could you give a specific example? Each bounded_iterator stores the information pertinent to its own operation (predicate, stride length, transform functor, whatever) + its base bounded iterator. The base bounded_iterator stores either two (for forward iterators: current and end) or three (for bidirectional and random_access: begin, current and end) base iterators. The only redundancy that I still see is that you may start with a random_access iterator at the base, but somewhere in the stack the traversal category is weakened to forward. Or the user of the iterator only uses forward traversal, although the iterator type would also support bidirectional traversal. Then the base bounded_iterator stores a begin that is never used. I am not sure this is worth optimizing for, but you could put the desired traversal category into the bounded_iterator template interface: bounded_iterator< I, traversal_category > bounded_iterator< char*, forward_traversal > would then be a forward bounded_iterator running on char*, of size 2x sizeof(char*). It only stores current char* and end char*, instead of begin char*, current char* and end char*. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

5 Sep 5 Sep

1:38 p.m.

on Fri Sep 05 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

2008/9/4 David Abrahams <dave@boostpro.com>:

...
...
...
...
But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

Oh, I totally missed that; sorry! Let's see...

strided_iterator<int*,5> stores range<int*>

sizeof(range<int*>) == 2*sizeof(int*)

filter_iterator<strided_iterator<int*,5> > stores range<strided_iterator<int*,5> >

range<strided_iterator<int*,5> > is represented as

unique_data<strided_iterator<int*,5> >::type start; unique_data<strided_iterator<int*,5> >::type finish; common_data<strided_iterator<int*,5> >::type common;

unique_data<strided_iterator<int*,5> >::type == int* common_data<strided_iterator<int*,5> >::type == int*

so its size is 3*sizeof(int*)

?? Did I get that right?

If so, it seems like there's still redundant storage. finish and common will be representing essentially the same information. If I were to write a filter_strided_iterator adapter and apply it to int*, it would have the size of 2 int*s. a

What information is contained in common that is not also contained in finish (i've never implemented a strided iterator, I'm trying to understand) ?

My point is that there is essentially the same information in common and finish, but that you can't eliminate it because the abstractions don't allow it. A strided_iterator implementation and test are attached.

Hum, I have been thinking about this for a while, and the more I think about it, the more it seems to me that that strided_iterator implementation is not optimal. Again, I've never needed one, so I'm not sure about the use case. I'll try to reconstruct the problems from the hints you left in this thread:

Basically a strided iterator is an adaptor that visit every Nth element in a range, where N is a compile time constant.

Well, whether to make it a compile-time constant or a runtime one is a design decision you can make either way, but that's basically it. You might use it to traverse the diagonal of a matrix, for example.

...

From your implementation I gather that the underlying range must be a random access range. A straight forward implementation is a thin wrapper around a random access iterator, you just replace operators ++ and -- with +=N (with similar changes for operators + and -).

The problem is that you want to visit ranges whose size is not an even multiple of N, the above scheme wouldn't work, because before comparison with endpoint, you would have already moved the iterator beyond that end point. If the endpoint is one past the end of the base range, you have UB.

The solution used in your implementation is to actually store the unstrided endpoint and position the internal iterator at that end point if the next increment would go beyond the boundary. This requires a conditional branch in increment, and makes the semantics of the iterator a bit surprising: you have to know the boundaries of the iteration when you construct the begin iterator and two iterators over the same underlying sequence with same stride and phase, but constructed with different end positions, will actually be iterating with diffent ranges (one iterator can't reach the end position of the other iterator).

Now, the obvious solution is not to store the end position but instead to use a base+offset representation which are only combined together at dereference time.

You're completely right about that, and there are even additional reasons that you didn't mention, as I realized a couple days ago. I didn't want to bring it up, though, because it doesn't change the fundamental reality. Just replace strided_iterator by another layer of filter_iterator.

...

[ in fact, searching the archives, it seems you are aware of this implementation, so I guess there is some problems I'm missing if you didn't use it]

No, it's just that I've forgotten more in my life than I ever knew. Remember, I forgot that you only need to store the endpoint of the underlying range (and not the beginning) if you implement filter_iterator correctly.

...

Assuming sizeof(size_t) == sizeof(void*), the size of this iterator is the same as the size of your implementation.

Now a base and range size are the only informations you need to reconstruct begin and end

begin.base = range.base; begin.offset= 0; end.base = range.base; end.offset = range.lenght;

Thus sizeof(strided_range) can be sizeof(void*)*2 which is optimal.

Yes, but that doesn't change anything fundamental about the issue I'm trying to point at.

...

This doesn't of course prove that storing a range would lead to optimal representation for all iterator adaptors, but at least is enough for guaranteeing an optimal (in term of space) filter_iterator<strided_iterator>.

Am I missing something?

You may be missing that the issue I'm describing applies more generally than in the single case of my filter_iterator<broken_strided_iterator<int*> > example. It fails to optimize any case where the outer iterator and inner iterator are storing essentially the same information. Another thing to look at: this sort of thing happens with permutation_iterator as well. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

2:27 p.m.

On Fri, Sep 5, 2008 at 3:38 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Fri Sep 05 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
2008/9/4 David Abrahams <dave@boostpro.com>:

...
...
...
...
But iterators like filter_iterator that need to know iteration boundaries, can store a range instead of an iterator, so they can benefit from the compression.

Oh, I totally missed that; sorry! Let's see...

strided_iterator<int*,5> stores range<int*>

sizeof(range<int*>) == 2*sizeof(int*)

filter_iterator<strided_iterator<int*,5> > stores range<strided_iterator<int*,5> >

range<strided_iterator<int*,5> > is represented as

unique_data<strided_iterator<int*,5> >::type start; unique_data<strided_iterator<int*,5> >::type finish; common_data<strided_iterator<int*,5> >::type common;

unique_data<strided_iterator<int*,5> >::type == int* common_data<strided_iterator<int*,5> >::type == int*

so its size is 3*sizeof(int*)

?? Did I get that right?

If so, it seems like there's still redundant storage. finish and common will be representing essentially the same information. If I were to write a filter_strided_iterator adapter and apply it to int*, it would have the size of 2 int*s. a

What information is contained in common that is not also contained in finish (i've never implemented a strided iterator, I'm trying to understand) ?

My point is that there is essentially the same information in common and finish, but that you can't eliminate it because the abstractions don't allow it. A strided_iterator implementation and test are attached.

Hum, I have been thinking about this for a while, and the more I think about it, the more it seems to me that that strided_iterator implementation is not optimal. Again, I've never needed one, so I'm not sure about the use case. I'll try to reconstruct the problems from the hints you left in this thread:

Basically a strided iterator is an adaptor that visit every Nth element in a range, where N is a compile time constant.

Well, whether to make it a compile-time constant or a runtime one is a design decision you can make either way, but that's basically it. You might use it to traverse the diagonal of a matrix, for example.

Ok.

...

...
From your implementation I gather that the underlying range must be a random access range. A straight forward implementation is a thin wrapper around a random access iterator, you just replace operators ++ and -- with +=N (with similar changes for operators + and -).

The problem is that you want to visit ranges whose size is not an even multiple of N, the above scheme wouldn't work, because before comparison with endpoint, you would have already moved the iterator beyond that end point. If the endpoint is one past the end of the base range, you have UB.

The solution used in your implementation is to actually store the unstrided endpoint and position the internal iterator at that end point if the next increment would go beyond the boundary. This requires a conditional branch in increment, and makes the semantics of the iterator a bit surprising: you have to know the boundaries of the iteration when you construct the begin iterator and two iterators over the same underlying sequence with same stride and phase, but constructed with different end positions, will actually be iterating with diffent ranges (one iterator can't reach the end position of the other iterator).

Now, the obvious solution is not to store the end position but instead to use a base+offset representation which are only combined together at dereference time.

You're completely right about that, and there are even additional reasons that you didn't mention, as I realized a couple days ago. I didn't want to bring it up, though, because it doesn't change the fundamental reality. Just replace strided_iterator by another layer of filter_iterator.

Hum, but as long as an iterator contains only a range, you can always encapsulate it with no space overhead. a filter_iterator<filter_iterator<T* > > has sizeof(T*2) (assuming a stateless predicate), which is optimal.

...

...
[ in fact, searching the archives, it seems you are aware of this implementation, so I guess there is some problems I'm missing if you didn't use it]

No, it's just that I've forgotten more in my life than I ever knew.

...

Remember, I forgot that you only need to store the endpoint of the underlying range (and not the beginning) if you implement filter_iterator correctly.

...
Assuming sizeof(size_t) == sizeof(void*), the size of this iterator is the same as the size of your implementation.

Now a base and range size are the only informations you need to reconstruct begin and end

begin.base = range.base; begin.offset= 0; end.base = range.base; end.offset = range.lenght;

Thus sizeof(strided_range) can be sizeof(void*)*2 which is optimal.

Yes, but that doesn't change anything fundamental about the issue I'm trying to point at.

well it changes the fact that you need to provide another counterexample, because a filtered_iterator<strided_iterator> has no redundant information.

...

...
This doesn't of course prove that storing a range would lead to optimal representation for all iterator adaptors, but at least is enough for guaranteeing an optimal (in term of space) filter_iterator<strided_iterator>.

Am I missing something?

You may be missing that the issue I'm describing applies more generally than in the single case of my filter_iterator<broken_strided_iterator<int*> > example. It fails to optimize any case where the outer iterator and inner iterator are storing essentially the same information.

Yes, I must missing it, because, as I see it, as long as the information can be espressed as a range, the outer iterator need only a single instance of a range of the underlying iterator. If that range representation is optimal, no information is duplicated. The optimization breaks down when you need more than two iterators to the same range, but I would like to see an example of such a beast (yeah, I lack fantasy :) ). And hey, I'm sure that beast exist, but if the optimization covers 90% of the cases for little a little cost (you need to specialize it associated_range and implement the optimal range for your iterators), I think it is a win. Associated_range might actually be useful beyond iterator nesting. And the optimization is optional, if associated_range is not specialized, or the associated_range doesn't guarantee optimal compression, everything will still work, but redundant information will be stored. -- gpd

David Abrahams

4:55 p.m.

on Fri Sep 05 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

...
You're completely right about that, and there are even additional reasons that you didn't mention, as I realized a couple days ago. I didn't want to bring it up, though, because it doesn't change the fundamental reality. Just replace strided_iterator by another layer of filter_iterator.

Hum, but as long as an iterator contains only a range, you can always encapsulate it with no space overhead. a filter_iterator<filter_iterator<T* > > has sizeof(T*2) (assuming a stateless predicate), which is optimal.

First of all, there's other kinds of data that's redundant across pairs of iterators in a range hey, like the predicate! But let's ignore that for a moment. The range compression formula is: sizeof(range<X>) == 2*sizeof(X) - sizeof(common_data<X>::type) so when X is filter_iterator<Y>: sizeof(X) is 2*sizeof(Y) common_data<X>::type == Y so: sizeof(range<X>) == 2*(2*sizeof(Y)) - sizeof(Y) == 3*sizeof(Y) so if sizeof(filter_iterator<Y>) == sizeof(range<Y>) sizeof(filter_iterator<Y>) == sizeof(range<Y>) == 2*sizeof(Y) - sizeof(common_data<Y>::type) and when Y is filter_iterator<Z>, X is filter_iterator<filter_iterator<Z> > sizeof(Y) is 2*sizeof(Z) common_data<Y>::type == Z so: sizeof(range<Y>) == 2*(2*sizeof(Z)) - sizeof(Z) == 3*sizeof(Z) sizeof(filter_iterator<Y>) == 3*sizeof(Z) sizeof(range<X>) == 3*sizeof(Y) == 3*2*sizeof(Z) == 6*sizeof(Z) What am I missing? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Giovanni Piero Deretta

5:41 p.m.

On Fri, Sep 5, 2008 at 6:55 PM, David Abrahams <dave@boostpro.com> wrote:

...

on Fri Sep 05 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
...
You're completely right about that, and there are even additional reasons that you didn't mention, as I realized a couple days ago. I didn't want to bring it up, though, because it doesn't change the fundamental reality. Just replace strided_iterator by another layer of filter_iterator.

Hum, but as long as an iterator contains only a range, you can always encapsulate it with no space overhead. a filter_iterator<filter_iterator<T* > > has sizeof(T*2) (assuming a stateless predicate), which is optimal.

First of all, there's other kinds of data that's redundant across pairs of iterators in a range hey, like the predicate!

that's not a problem, a filter_range would store the predicate only once.

...

But let's ignore that for a moment. The range compression formula is:

sizeof(range<X>) == 2*sizeof(X) - sizeof(common_data<X>::type)

so when X is filter_iterator<Y>:

sizeof(X) is 2*sizeof(Y) common_data<X>::type == Y

Oh! now I see where is where our inpedence mismatch is!. You want to store three pointers in filter_range because you correctly want this transformation to be lossless: pair<filter_iterator, filter_iterator> a; filter_range f(a.begin, a.end) assert(f.begin() == a); assert(f.end() == b); i.e. the hidden common knowledge of a.begin and a.end of the effective end of underlying range (let's call it real_end) is preserved. I was naively thinking that, by transforming the pair in a full featured range, I could completely forget about real_end, and only encode to a.begin, a.end. In practice the whole filter_range information would be contained in a.begin, a.end would be completely redundant. The consequence of this is that the range could only shrink and never widen. Now, in general an iterator adaptor cannot rely on the fact that can advance the underlying range 'begin' beyond its 'end' , so real_and is effectively completely unecessary. Unfortunately the outside world knows better and can use .base() to extract the underlying iterator and would expect that real_end to be preserved. Now I see two solution: - declare that the filter_range is not semantically equivalent to the original iterator pairs, but make the loss of information explicit. In practice this would means that the range could only shrink, but would make it possible to have space efficient abstractions. - just declare defeat, and concede that you are right :) -- gpd

David Abrahams

6:36 p.m.

on Fri Sep 05 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

Oh! now I see where is where our inpedence mismatch is!. You want to store three pointers in filter_range because you correctly want this transformation to be lossless:

pair<filter_iterator, filter_iterator> a;

filter_range f(a.begin, a.end)

I think you mean filter_range f(a.first, a.second) And I'll assume from what you write below this not adding an additional layer of filtering; it is just storing a.first and a.second (compressed).

...

assert(f.begin() == a); assert(f.end() == b);

I think you mean assert(f.begin() == a.first); assert(f.end() == a.second);

...

i.e. the hidden common knowledge of a.begin and a.end of the effective end of underlying range (let's call it real_end) is preserved.

I wasn't motivated by that, actually. Frankly, I don't see why the invariant shouldn't hold even if you store only two pointers. The underlying end iterator would have been advanced to the nearest element satisfying the predicate, or real_end, by the time you try to construct the filter_range. A similar case is: assert( make_filter_range(a, a+len/2).end() == make_filter_range(a+len/2,a+len).begin() ) I don't think even our existing filter_iterator implementation supports that, though ;-). It's not unreasonable to request that all range-limited iterators traversing the same logical range be initialized with the same real_end.

...

I was naively thinking that, by transforming the pair in a full featured range, I could completely forget about real_end, and only encode to a.begin, a.end. In practice the whole filter_range information would be contained in a.begin, a.end would be completely redundant. The consequence of this is that the range could only shrink and never widen.

I think that's a great insight... for filter_range's iterator redundancy. It won't help with other redundancies, I'm pretty sure. However, they're less likely to occur across levels of stacking than is the sort of range limitation in filter_iterator.

...

Now, in general an iterator adaptor cannot rely on the fact that can advance the underlying range 'begin' beyond its 'end' , so real_and is effectively completely unecessary. Unfortunately the outside world knows better and can use .base() to extract the underlying iterator and would expect that real_end to be preserved.

Well, now, the underlying iterator is not real_end anyway. base() doesn't help you get it.

...

Now I see two solution:

- declare that the filter_range is not semantically equivalent to the original iterator pairs, but make the loss of information explicit. In practice this would means that the range could only shrink, but would make it possible to have space efficient abstractions.

- just declare defeat, and concede that you are right :)

Not so fast. You may be able to have your cake and eat it too. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

2 Sep 2 Sep

3:38 p.m.

...

...
The RangeEx documentation writes something like "what algorithms are to containers, adaptor ranges are to ranges". Whenever you are applying one (non-permutating) algorithm to a container, and then another and then another, you can stack range adaptors. It's like lazy evaluation of a Haskell list.

...

Sheesh; give me a little credit, please. I know what range adaptors do.

So why are you asking these questions then? With smart compilers, generic programming is supposed to produce code that is pretty close to handcrafted code. That is what people expect, and that is what generic programming is sold on. The current range adaptors were far from it. And even if stacks don't exceed size 3 or 4 in practice, which is all I have in my code, why produce code that is factor 8 or 16 slower or more storage-bloated than necessary? As we have seen, the reason was not that the problem is intractable with generic programming. It just took a little thought to do it right. -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Neil Groves

4 p.m.

Arno, On Tue, Sep 2, 2008 at 4:38 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

...
...
The RangeEx documentation writes something like "what algorithms are to containers, adaptor ranges are to ranges". Whenever you are applying one (non-permutating) algorithm to a container, and then another and then another, you can stack range adaptors. It's like lazy evaluation of a Haskell list.

...
Sheesh; give me a little credit, please. I know what range adaptors do.

So why are you asking these questions then? With smart compilers, generic programming is supposed to produce code that is pretty close to handcrafted code. That is what people expect, and that is what generic programming is sold on. The current range adaptors were far from it. And even if stacks don't exceed size 3 or 4 in practice, which is all I have in my code, why produce code that is factor 8 or 16 slower or more storage-bloated than necessary? As we have seen, the reason was not that the problem is intractable with generic programming. It just took a little thought to do it right.

Please do not get frustrated. We all want to improve Range / RangeEx. I have to apologise for not offering to modify the code sooner. I have been watching, and learning. Once I have understood the design alternatives and have fully comprehended the optimum solution I will implement the suggested improvements. It looks like I will need to coordinate with the Iterator library developers for the optimum solution. So far RangeEx has not made it into Boost, and thus asking for changes in other libraries before seemed unlikely to succeed. While I accept that the adaptors are possibly not the optimum solution they are at least implementations of the functionality. I would gladly accept superior contributions ;-). I am also happy to spend more time improving them myself. I have to apologise for slow response times, my full-time job simply doesn't allow me enough time to contribute to this discussion as much as I probably ought to. You have made some very valid technical points, let's focus on these please.

...

-- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director

Best wishes, Neil Groves

Arno Schödl

4:32 p.m.

...

optimum solution they are at least implementations of the functionality. I would gladly accept superior contributions ;-). I am also happy to spend more time improving them myself. I have to apologise for slow response times, my full-time job simply doesn't allow me enough time to contribute to this discussion as much as I probably ought to. You have made some very valid technical points, let's focus on these please.

My last post was not intended to put pressure on anyone to change anything, and I am very grateful for having boost. But Dave evidently thinks I am wasting everyone's time, and I beg to differ. If you agree that the general direction, self-contained iterators that use funky ranges to represent themselves efficiently, is right, I would bit by bit change my implementations of adaptor ranges and share them with you. I have not yet contributed to Boost, so probably my code is non-conformant all over the place, but it is a start. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Neil Groves

4:45 p.m.

On Tue, Sep 2, 2008 at 5:32 PM, Arno Schödl <aschoedl@think-cell.com> wrote:

...

...
optimum solution they are at least implementations of the functionality. I would gladly accept superior contributions ;-). I am also happy to spend more time improving them myself. I have to apologise for slow response times, my full-time job simply doesn't allow me enough time to contribute to this discussion as much as I probably ought to. You have made some very valid technical points, let's focus on these please.

My last post was not intended to put pressure on anyone to change anything, and I am very grateful for having boost. But Dave evidently thinks I am wasting everyone's time, and I beg to differ.

...

If you agree that the general direction, self-contained iterators that use funky ranges to represent themselves efficiently, is right, I would bit by bit change my implementations of adaptor ranges and share them with you. I have not yet contributed to Boost, so probably my code is non-conformant all over the place, but it is a start.

I would really like to see some of the changes that you make, and that would help me apply a consistent set of improvements to other ranges and adaptors that could benefit. Once things are a bit more concrete we can measure and compare the storage and performance benefits. There will be very little argument with objective measurements. We can work together to make them work and compliant with the intention of incorporating the changes into RangeEx.

...

Arno

-- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director

I hope this helps, Neil Groves

David Abrahams

6:39 p.m.

on Tue Sep 02 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
optimum solution they are at least implementations of the functionality. I would gladly accept superior contributions ;-). I am also happy to spend more time improving them myself. I have to apologise for slow response times, my full-time job simply doesn't allow me enough time to contribute to this discussion as much as I probably ought to. You have made some very valid technical points, let's focus on these please.

My last post was not intended to put pressure on anyone to change anything, and I am very grateful for having boost. But Dave evidently thinks I am wasting everyone's time,

I don't think I said anything of the sort. I think you're asking interesting questions that might lead to some important work. However, it seems to me that you already have a general answer in mind, and it might be better to back up and look at the real problems you want to solve before moving on to the answer. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams

6:36 p.m.

on Tue Sep 02 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
The RangeEx documentation writes something like "what algorithms are to containers, adaptor ranges are to ranges". Whenever you are applying one (non-permutating) algorithm to a container, and then another and then another, you can stack range adaptors. It's like lazy evaluation of a Haskell list.

...
Sheesh; give me a little credit, please. I know what range adaptors do.

So why are you asking these questions then?

I'm trying to find out what the specific example applications are. I have a suspicion that range adapters or single-level iterator adaptors may not be the best way to address these applications and may even sacrifice significant performance, but I can't tell until I know what the applications are. Just so, single-level iterators (or ordinary range adapters, which don't add anything conceptually to iterators) are not the best way to address iteration over hierarchical data structures, which is why we have segmented iterators.

...

With smart compilers, generic programming is supposed to produce code that is pretty close to handcrafted code. That is what people expect, and that is what generic programming is sold on.

Generic Programming is supposed to be driven by concrete implementations and real use cases. http://www.stlport.org/resources/StepanovUSA.html: It is as if mathematicians would start with axioms. You do not start with axioms - you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms. Only when you understand them well, can you come up with an interface that will let them work. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

9:39 p.m.

...

I'm trying to find out what the specific example applications are. I have a suspicion that range adapters or single-level iterator adaptors may not be the best way to address these applications and may even sacrifice significant performance, but I can't tell until I know what the applications are. Just so, single-level iterators (or ordinary range adapters, which don't add anything conceptually to iterators) are not the best way to address iteration over hierarchical data structures, which is why we have segmented iterators.

This is a typical one, copied straight from our code: grdlns->InsertGridlines( vecgvepsSource, make_first_range(vecpairanchorsrcgrdln), GridlineInsertionPositions( make_transform_range( make_transform_range( make_first_range( vecpairanchorsrcgrdln ), boost::mem_fn(&_vector< Ref<CGridlineAnchor> >::begin) ), boost::mem_fn( &CGridlineAnchor::Binding ) ), vecgrdlnSnapped), vecgrdlnSnapped ); This is a case in point: I did not write this code, and in the absence of specific guidelines against, the programmer decided to stack transform_range. In this case, performance-wise, this is no better or worse than using a single transform_range with boost::bind. Likewise, people may innocently write rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) | filtered( funcD ) | filtered( funcE ) blissfully unaware of any performance penalty.

...

However, it seems to me that you already have a general answer in mind, and it might be better to back up and look at the real problems you want to solve before moving on to the answer.

Would it be my turn now to ask for some credit ;-) ? Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

3 Sep 3 Sep

1:05 p.m.

on Tue Sep 02 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
I'm trying to find out what the specific example applications are. I have a suspicion that range adapters or single-level iterator adaptors may not be the best way to address these applications and may even sacrifice significant performance, but I can't tell until I know what the applications are. Just so, single-level iterators (or ordinary range adapters, which don't add anything conceptually to iterators) are not the best way to address iteration over hierarchical data structures, which is why we have segmented iterators.

This is a typical one, copied straight from our code:

grdlns->InsertGridlines( vecgvepsSource, make_first_range(vecpairanchorsrcgrdln), GridlineInsertionPositions( make_transform_range( make_transform_range( make_first_range( vecpairanchorsrcgrdln ), boost::mem_fn(&_vector< Ref<CGridlineAnchor> >::begin) ), boost::mem_fn( &CGridlineAnchor::Binding ) ), vecgrdlnSnapped), vecgrdlnSnapped );

The first problem is that I can't tell what that does. So it doesn't really begin to answer the question.

...

This is a case in point: I did not write this code, and in the absence of specific guidelines against, the programmer decided to stack transform_range. In this case, performance-wise, this is no better or worse than using a single transform_range with boost::bind. Likewise, people may innocently write

rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) | filtered( funcD ) | filtered( funcE )

blissfully unaware of any performance penalty.

Yes, but: 1. _Do_ they write that? 2. If they do that, I believe they are still paying an enormous unnecessary runtime penalty even with your optimization because of the endpoint checks in each layer of the composite iterator or range. The only way IIUC to avoid such a penalty is to compress the filter adapters into one, essentially by using expression templates. 3. I am not yet convinced that you are suffering a significant performance penalty due to the wasted space in real applications. Have you measured it? Unless you're actually *storing* lots of these stacked iterators or passing them around by value in your inner loops, it seems unlikely to make much of a difference. 4. I can understand wanting to improve these components so that they can be efficient under more conditions, and avoiding the redundant storage would undoubtedly be a real improvement. However, I think there's a deeper inefficiency in stacking all those filter iterators together, and it's neither a range library's job, nor I think is it possible in this case, to protect users from "blissful unawareness" of performance implications.

...

...
However, it seems to me that you already have a general answer in mind, and it might be better to back up and look at the real problems you want to solve before moving on to the answer.

Would it be my turn now to ask for some credit ;-) ?

Take as much as you like, but -- sorry -- I'm not yet convinced we're solving a real problem here, and if it _is_ real, I am not quite convinced the general approach of squeezing out redundant storage would address all of the problem. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

3:08 p.m.

...

...
rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) | filtered( funcD ) | filtered( funcE )

...

1. _Do_ they write that?

Why not? They did for transform_range, and the notation is getting easier...

...

2. If they do that, I believe they are still paying an enormous unnecessary runtime penalty even with your optimization because of the endpoint checks in each layer of the composite iterator or range. The only way IIUC to avoid such a penalty is to compress the filter adapters into one, essentially by using expression templates.

See answer to 4.

...

3. I am not yet convinced that you are suffering a significant performance penalty due to the wasted space in real applications. Have you measured it? Unless you're actually *storing* lots of these stacked iterators or passing them around by value in your inner loops, it seems unlikely to make much of a difference.

Giovanni has seen the quadratic stack expansions. If you are worried about performance of pointers, why not worry about the performance of wrappers around pointers?

...

4. I can understand wanting to improve these components so that they can be efficient under more conditions, and avoiding the redundant storage would undoubtedly be a real improvement. However, I think there's a deeper inefficiency in stacking all those filter iterators together, and it's neither a range library's job, nor I think is it possible in this case, to protect users from "blissful unawareness" of performance implications.

The design we came up with addresses the general problem of stacking range_adaptors that must store their end, like difference_range/filter_range/union_range/[other algorithms that I cannot think of now], in any combination. If you have thought up expression templates that cover all of these in a unified fashion with better performance than stacked iterators, I'd be very happy and try to implement that instead.

...

I'm not yet convinced we're solving a real problem here,

I have enough of that stuff in my code that I am convinced we do.

...

and if it _is_ real, I am not quite convinced the general approach of squeezing out redundant storage would address all of the problem.

It does not, you already mentioned the superfluous end checks. As I said, if you have the whole solution, let me know. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

5:09 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) | filtered( funcD ) | filtered( funcE )

...
1. _Do_ they write that?

Why not? They did for transform_range, and the notation is getting easier...

Yes, I never contested that they *could* write it. I'm still asking you the same question: what real life computations demand a solution to this problem?

...

...
2. If they do that, I believe they are still paying an enormous unnecessary runtime penalty even with your optimization because of the endpoint checks in each layer of the composite iterator or range. The only way IIUC to avoid such a penalty is to compress the filter adapters into one, essentially by using expression templates.

See answer to 4.

...
3. I am not yet convinced that you are suffering a significant performance penalty due to the wasted space in real applications. Have you measured it? Unless you're actually *storing* lots of these stacked iterators or passing them around by value in your inner loops, it seems unlikely to make much of a difference.

Giovanni has seen the quadratic stack expansions.

Sure, but that does not necessarily amount to a significant performance penalty.

...

If you are worried about performance of pointers, why not worry about the performance of wrappers around pointers?

I actually am a bit worried about it. However, as I am trying to tell you, I don't believe that what we're talking about here addresses the whole underlying problem. You're talking about eliminating redundancy across pairs of iterators, but AFAICT not the redundancy through the vertical "stack."

...

...
4. I can understand wanting to improve these components so that they can be efficient under more conditions, and avoiding the redundant storage would undoubtedly be a real improvement. However, I think there's a deeper inefficiency in stacking all those filter iterators together, and it's neither a range library's job, nor I think is it possible in this case, to protect users from "blissful unawareness" of performance implications.

The design we came up with addresses the general problem of stacking range_adaptors that must store their end, like difference_range/filter_range/union_range/[other algorithms that I cannot think of now], in any combination.

OK, let me try to spell this out. Let's suppose for a moment that you build a filter_iterator<strided_iterator<int*,5>, F>. A strided_iterator is one that visits every Nth element. Unless you know that there are an even multiple of N elements, it needs to store an endpoint because otherwise the iterator might stride right past the last stored location. Thus sizeof(strided_iterator<int*,5>) == 2*sizeof(int*) and its increment operator looks like: if (pos <= end - N) pos += N; else pos = end; Now as you know, a filter_iterator also needs to store an endpoint. So assuming F is empty and EBO is used to eliminate its storage, then sizeof(filter_iterator<strided_iterator<int*,5>, F>) == 2*sizeof(strided_iterator<int*,5>) == 2*2*sizeof(int*) and the filter iterator's increment looks like: do { ++pos2; } while (pos2 != end2 && !f(*pos2)); Now if we expand the incrementation of pos2, it's do { if (pos <= end - N) pos += N; else pos = end; } while (pos2 != end2 && !f(*pos2)); Whereas it should only be: do { if (pos <= end - N) pos += N; else pos = end; break; } while (!f(*pos2)); No approach that only addresses the space wasted across pairs of iterators can help with these issues, so I think a more fundamental look at the problem is called for. And again, I think the problem is very much like what happens with segmented iterators so its worth looking there for inspiration. It may be that, as with segmented iterators, a more general form for the algorithms is called for.

...

If you have thought up expression templates that cover all of these in a unified fashion with better performance than stacked iterators, I'd be very happy and try to implement that instead.

Again, I'm not ready for answers yet. I'm only just now discovering the questions.

...

...
I'm not yet convinced we're solving a real problem here,

I have enough of that stuff in my code that I am convinced we do.

That fact alone does not make it a real problem in my book.

...

...
and if it _is_ real, I am not quite convinced the general approach of squeezing out redundant storage would address all of the problem.

It does not, you already mentioned the superfluous end checks. As I said, if you have the whole solution, let me know.

I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

5:16 p.m.

...

I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution.

Before I start playing around with the problem, is my throwing increment_and_dereference optimization fair game? I believe I can't do without. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Arno Schödl

5:19 p.m.

...

...
I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution.

...

Before I start playing around with the problem, is my throwing increment_and_dereference optimization fair game? I believe I can't > do without.

Since dereference does not need to end-check, throwing increment() is probably the basic building block. Fair game? Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

5:24 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution.

...
Before I start playing around with the problem, is my throwing increment_and_dereference optimization fair game? I believe I can't > do without.

Since dereference does not need to end-check, throwing increment() is probably the basic building block. Fair game?

Sorry, I really don't know what you're asking. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

5:36 p.m.

...

...
...
...
I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution.

...
Before I start playing around with the problem, is my throwing increment_and_dereference optimization fair game? I believe I can't > do without.

Since dereference does not need to end-check, throwing increment() is probably the basic building block. Fair game?

Sorry, I really don't know what you're asking.

A basic multiple-filter loop is this: filter_increment { for(;;) { ++i; if( empty() || predA( dereference() && predB( dereference ) && predC( dereference ) && predD( dereference ) ) break; // <--- towards base_range *** function stack *** towards outermost filter_iterator ----> } } If empty() evaluates to true, all pred* are skipped. This "skipping" in recursion needs exceptions. Exceptions are the only means C++ gives us to "jump" in a recursive environment. At least in Win32, exceptions are slow, probably too slow to be practical. I heard that table-based exceptions are much faster, but I have no practical experience. Before spending time working on the problem you posed, I want to make sure than my solution does not get trashed because I use exceptions. If we want to avoid them, I would say that there is no way to get optimal performance with stacked iterators, and I would not even try. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

6:26 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
...
...
I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution.

...
Before I start playing around with the problem, is my throwing increment_and_dereference optimization fair game? I believe I can't > do without.

Since dereference does not need to end-check, throwing increment() is probably the basic building block. Fair game?

Sorry, I really don't know what you're asking.

A basic multiple-filter loop is this:

filter_increment { for(;;) { ++i; if( empty() || predA( dereference() && predB( dereference ) && predC( dereference ) && predD( dereference ) ) break; // <--- towards base_range *** function stack *** towards outermost filter_iterator ----> } }

If empty() evaluates to true, all pred* are skipped.

Uh-huh...

...

This "skipping" in recursion needs exceptions. Exceptions are the only means C++ gives us to "jump" in a recursive environment.

I don't know what recursion you are referring to.

...

At least in Win32, exceptions are slow, probably too slow to be practical. I heard that table-based exceptions are much faster,

Only for the non-exceptional path.

...

but I have no practical experience.

My advice: forget exceptions. They're totally inappropriate in this context IMO.

...

Before spending time working on the problem you posed, I want to make sure than my solution does not get trashed because I use exceptions.

Trashed, I don't know about that. However, I'm not interested in any such approach.

...

If we want to avoid them, I would say that there is no way to get optimal performance with stacked iterators, and I would not even try.

Again, I encourage you to -- at least temporarily -- give up the assumption that these problems should be solved with stacked iterators when the costs we're discussing actually matter. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

7:14 p.m.

...

...
This "skipping" in recursion needs exceptions. Exceptions are the only means C++ gives us to "jump" in a recursive environment.

...

I don't know what recursion you are referring to.

The stack of adaptors calling each other.

...

Only for the non-exceptional path.

Too bad.

...

but I have no practical experience.

...

My advice: forget exceptions. They're totally inappropriate in this context IMO.

IMO, you need some sort of jump to solve the repeated end-checking problem. If the base iterator end check is nested into functions, the only way to transport this information out is via exceptions. If you don't like exceptions, the "I am at the end of the underlying sequence" information must be generated at the outermost level of the function tree. If it is generated anywhere else, there is simply no way to transport the information out without checking the return value of a function, which is the extra check we want to avoid. Do we agree? Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

11:46 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
This "skipping" in recursion needs exceptions. Exceptions are the only means C++ gives us to "jump" in a recursive environment.

...
I don't know what recursion you are referring to.

The stack of adaptors calling each other.

That's not recursion, FWIW.

...

...
Only for the non-exceptional path.

Too bad.

...
but I have no practical experience.

...
My advice: forget exceptions. They're totally inappropriate in this context IMO.

IMO, you need some sort of jump to solve the repeated end-checking problem.

I think if you hand-code an iterator that is semantically equivalent to the one that results from one of your stackings, you will see that no such "jump" is required.

...

If the base iterator end check is nested into functions, the only way to transport this information out is via exceptions. If you don't like exceptions,

Me, not like exceptions? I hope that's a joke!

...

the "I am at the end of the underlying sequence" information must be generated at the outermost level of the function tree. If it is generated anywhere else, there is simply no way to transport the information out without checking the return value of a function, which is the extra check we want to avoid. Do we agree?

I think the optimizer can remove some redundant tests if the functions are inlined. But, that said, I am becoming more and more convinced as this discussion goes on that compressing stacked iterator adaptors is barking up the wrong tree if you care about efficiency. The difference between the code generated in the abstracted case and the code you'd write by hand is simply too great. No, I don't have a picture of a better solution yet. But it's an interesting problem. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Watanabe

4 Sep 4 Sep

3:29 a.m.

AMDG David Abrahams wrote:

...

I think the optimizer can remove some redundant tests if the functions are inlined. But, that said, I am becoming more and more convinced as this discussion goes on that compressing stacked iterator adaptors is barking up the wrong tree if you care about efficiency. The difference between the code generated in the abstracted case and the code you'd write by hand is simply too great.

No, I don't have a picture of a better solution yet. But it's an interesting problem.

Here are some measurements for my current version of filter_iterator In brief: 4 function objects containing two integers each: 3.39 seconds 4 stateless function objects: 2.813 seconds 4 stateless function objects combined using phoenix's && rather than stacked iterators: 2.563 seconds 1 stateless function object 2.015 seconds raw for loop using vector iterators 2.516 seconds raw for loop using pointers: 1.703 seconds These were all doing the same calculation over the same underlying vector. In Christ, Steven Watanabe

Michael Marcin

6:51 p.m.

Steven Watanabe wrote:

...

AMDG

David Abrahams wrote:

...
I think the optimizer can remove some redundant tests if the functions are inlined. But, that said, I am becoming more and more convinced as this discussion goes on that compressing stacked iterator adaptors is barking up the wrong tree if you care about efficiency. The difference between the code generated in the abstracted case and the code you'd write by hand is simply too great.

No, I don't have a picture of a better solution yet. But it's an interesting problem.

Here are some measurements for my current version of filter_iterator

In brief:

4 function objects containing two integers each: 3.39 seconds 4 stateless function objects: 2.813 seconds 4 stateless function objects combined using phoenix's && rather than stacked iterators: 2.563 seconds 1 stateless function object 2.015 seconds raw for loop using vector iterators 2.516 seconds raw for loop using pointers: 1.703 seconds

These were all doing the same calculation over the same underlying vector.

I ran your attached test on VC9 SP1 default console app release mode with _SECURE_SCL=0 on my Core 2 Quad Q6600 2.4Ghz with 4GB of RAM 50.702 seconds 19.984 seconds 48.453 seconds 13.343 seconds 4.313 seconds 2.953 seconds Press any key to continue . . . Are these all truly doing the same work? Thanks, Michael Marcin

Michael Marcin

7:11 p.m.

Michael Marcin wrote:

...

I ran your attached test on VC9 SP1 default console app release mode with _SECURE_SCL=0 on my Core 2 Quad Q6600 2.4Ghz with 4GB of RAM

50.702 seconds 19.984 seconds 48.453 seconds 13.343 seconds 4.313 seconds 2.953 seconds Press any key to continue . . .

Are these all truly doing the same work?

I looked at the generated code to see why the timing were so drastically different and I realized the compiler wasn't inlining a lot of trivial functions. I ran it again after I flipped inline function expansion from default to Any Suitable 9.297 seconds 5.812 seconds 8.75 seconds 4.688 seconds 2.922 seconds 3.016 seconds Press any key to continue . . . Thanks, Michael Marcin

Steven Watanabe

3:11 a.m.

AMDG Arno Schödl wrote:

...

Before spending time working on the problem you posed, I want to make sure than my solution does not get trashed because I use exceptions. If we want to avoid them, I would say that there is no way to get optimal performance with stacked iterators, and I would not even try.

It ought to be possible to propagate a bool up the stack indicating whether the increment just reached the end. In principle, if the compile inlines all the functions, then it ought to be able to avoid any extra boolean test. In Christ, Steven Watanabe

Arno Schödl

8:55 a.m.

...

It ought to be possible to propagate a bool up the stack indicating whether the increment just reached the end.

...

In principle, if the compile inlines all the functions, then it ought to be able to avoid any extra boolean test.

I thought about what can be done in principle, without restricting myself to iterators, to optimize the kind of nested functions on sequences we are discussing. I want to start with the end() check problem in increment. Let's consider a toy example: ... F2( T2 ( F1( T1( R ) ) ) ) Handcrafted code would look like this. I wrote it with gotos to avoid pre-judgement: increment() { next: if( base.empty() ) return; base.increment(); T1::reference t1=T1( *base ); if( !F1( t1 ) ) goto next; T2::reference t2=T2( t1 ); if( !F2( t2 ) ) goto next; .... }; Since we will use function templates to generate this code, we must now organize the code into the blocks we want to create. We basially have two choices, first the obvious one: increment() { do { do { { if( base.empty() ) return; // one path out base.increment(); // dropping out the bottom is other path out } T1::reference t1=T1( *base ); } while( !F1( t1 ) ); T2::reference t2=T2( t1 ); } while( !F2( t2 ) ); }; With templates, we can only create classes and their functions, not arbitrary code blocks. With functions in C++, without exceptions, each function can only have a single return path. It cannot "tell" the calling function anything but its return value and out parameters. To act on that, the caller must check them, which is the extra cost. If we don't allow exceptions, this nesting is out, because we cannot implement the "return". It is pretty obvious that there is only one substantial alternative, head recursion instead of tail recursion: increment() { while( !base.empty() ) { base.increment(); { T1::reference t1=T1( *base ); if( F1( t1 ) ) { T2::reference t2=T2( t1 ); if( F2( t2 ) ) { // same problem, two exits out of this block return; } } } } }; It first looks like we have the same problem, and strictly speaking, we have. But since there is no more code in the outer blocks after the return, we can cut down the overhead to a single check for the whole chain: increment() { bool bOk; do { if( base.empty() ) return; // one path out base.increment(); { T1::reference t1=T1( *base ); if( bOk=F1( t1 ) ) { T2::reference t2=T2( t1 ); bOk=F2( t2 ); // we save a check here, but if any of the blocks further out skip, we pay a single check } } } while( !bOk ); // check is overhead }; For our special case ... F2( T2 ( F1( T1( R ) ) ) ) we can now reduce the overhead to a single test. I think this is what you, Steven, had in mind. What kind of function chains could we apply this optimization to? You can see that the basic structure is to generate an element and then have a function that checks only on the basis of this element and internal state (which in this picture would be static variables in the respective block) whether to let this element pass. In other words, we can handle a chain of functions that transforms each input element of one type into zero or one output elements of another type. transform, filter and strided are like that. difference and union are not, because they have two inputs. unique is almost like that, but needs to have access to the last input element that it let pass. It may be worthwhile to store this element internally, but let's not get hung up on details. An expression template to generate this code would at the leaves of the function tree separate chains of the type of functions we can support and then generate head recursion for them. It would only work starting at leaves. For everything downstream (to the root of the tree) from a non-compliant function like difference_range the optimization could not be applied, and we would need to generate normal tail recursion like we now do when stacking iterators. Is it worthwhile? I am asking for your opinion. The alternative, IMO, would be to stick to the standard tail recursion, and make it as easy as possible for the compiler to optimize the extra checks away. Instead of an increment that throws, it would return whether it reached the end. The base implementation would look like this: class associated_range { ... bool try_increment() { if( empty() ) return true; increment(); return false; } }; We would call it with #define increment_or_exit( range ) { if( range.try_increment() ) return true; } Hopefully, if the compiler encounters this in each function in the chain, it would drop the check. filter_range would then look like this: class< Base > class filter_range { void try_increment() { for(;;) { increment_or_exit( base ); Base::reference t=base.dereference(); if( pred( t ) ) return false; } } void increment() { try_increment(); } reference dereference() const { return base.dereference(); } }; If we ever get super-fast exceptions, this implementation can easily be turned into one that takes advantage of them. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

3 Sep 3 Sep

5:23 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution.

Before I start playing around with the problem, is my throwing increment_and_dereference optimization fair game? I believe I can't do without.

Fair game, sure, but AFAICT it gains us nothing. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

3:12 p.m.

As with these end checks, what if dereference would throw exceptions? End checks gone, all the way up the stack... Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Steven Watanabe

3:24 p.m.

AMDG Arno Schödl wrote:

...

As with these end checks, what if dereference would throw exceptions? End checks gone, all the way up the stack...

Unfortunately not. This simply moves the check into the dereference operation Also, remember that a filter_iterator dereferences iterator in it's operator++, so you the number of comparisons is unchanged. In Christ, Steven Watanabe

Arno Schödl

4:18 p.m.

...

...
As with these end checks, what if dereference would throw exceptions? End checks gone, all the way up the stack...

...

Unfortunately not.

...

This simply moves the check into the dereference operation Also, remember that a filter_iterator dereferences iterator in it's operator++, so you the number of comparisons is unchanged.

class filter_range { reference increment_and_dereference() throw( at_end ) { // If reference is value_type&, this method has // no local variables with dtor, so it needs no exception frame. // Building one would negate any performance gain. for(;;) { reference t=base.increment_and_dereference(); if( pred(t) ) return t; } } void increment() { try { increment_and_dereference(); } catch( at_end& ) {} } reference dereference() const { return base.dereference(); } }; class base_range() { ... reference increment_and_dereference() throw( at_end ) { if( empty() ) throw at_end(); increment(); return dereference(); } }; I have not thought about whether other adaptor_ranges (difference, union, ...) could also use the increment_and_dereference idiom, so that mixed-type stacks (which are the really interesting ones) could be built. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Steven Watanabe

4:34 p.m.

AMDG Arno Schödl wrote:

...

reference t=base.increment_and_dereference();

I have not thought about whether other adaptor_ranges (difference, union, ...) could also use the increment_and_dereference idiom, so that mixed-type stacks (which are the really interesting ones) could be built.

Oh. The important optimization is that the increment is combined with the test, the dereference is not as important. Also, the performance of throwing an exception is probably unacceptable for small ranges. In Christ, Steven Watanabe

David Abrahams

4:47 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

As with these end checks, what if dereference would throw exceptions? End checks gone, all the way up the stack...

The problem happens way before dereference: it's the undefined behavior that results from even moving the underlying iterator past the end of the underlying sequence. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Watanabe

6:37 p.m.

AMDG David Abrahams wrote:

...

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...
As with these end checks, what if dereference would throw exceptions? End checks gone, all the way up the stack...

The problem happens way before dereference: it's the undefined behavior that results from even moving the underlying iterator past the end of the underlying sequence.

Using Arno's suggestion, dereferencing the end iterator ought to throw, which happens before we increment off the end, right? It sounds like he's proposing making iterators work the way they do in python. In Christ, Steven Watanabe

David Abrahams

6:55 p.m.

on Wed Sep 03 2008, Steven Watanabe <watanabesj-AT-gmail.com> wrote:

...

AMDG

David Abrahams wrote:

...
on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...
As with these end checks, what if dereference would throw exceptions? End checks gone, all the way up the stack...

The problem happens way before dereference: it's the undefined behavior that results from even moving the underlying iterator past the end of the underlying sequence.

Using Arno's suggestion, dereferencing the end iterator ought to throw, which happens before we increment off the end, right?

The *underlying* iterator? How are you going to get that to throw if it's a pointer?

...

It sounds like he's proposing making iterators work the way they do in python.

Sorta, yeah. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Watanabe

7:13 p.m.

AMDG David Abrahams wrote:

...

...
Using Arno's suggestion, dereferencing the end iterator ought to throw, which happens before we increment off the end, right?

The *underlying* iterator? How are you going to get that to throw if it's a pointer?

You would have to wrap it. Any solution using iterators adapters is going to have to rely on detecting whether an iterator knows it's own end. In Christ, Steven Watanabe

Arno Schödl

7:16 p.m.

...

...
Using Arno's suggestion, dereferencing the end iterator ought to throw, which happens before we increment off the end, right?

...

The *underlying* iterator? How are you going to get that to throw if it's a pointer?

You don't need to. The underlying iterator can check explicitly. Only the adaptors must pass this information up with zero overhead, e.g., by doing nothing. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

David Abrahams

11:46 p.m.

on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:

...

...
...
Using Arno's suggestion, dereferencing the end iterator ought to throw, which happens before we increment off the end, right?

...
The *underlying* iterator? How are you going to get that to throw if it's a pointer?

You don't need to. The underlying iterator can check explicitly.

A pointer can check explicitly? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Arno Schödl

4 Sep 4 Sep

6:34 a.m.

...

...
You don't need to. The underlying iterator can check explicitly.

A pointer can check explicitly?

C'mon, a wrapper can, Steven said it already. Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Steven Watanabe

1 Sep 1 Sep

6:37 p.m.

AMDG David Abrahams wrote:

...

on Mon Sep 01 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
Need to think about it. Could you give a practical example of an iterator adapter needing three iterators?

E.g., a bidirectional filtered_iterator needs to be prevented from stepping off both ends of the underlying data to avoid undefined behavior.

Do we really have to worry about stepping off the beginning? I mean, you aren't allowed to decrement the iterator at the beginning of a range, anyway, and if you decrement any other iterator, the predicate is guaranteed to stop it before it goes off the beginning.

...

On the other hand, you shouldn't have to pay for those endpoint comparisons all the time, which tells me that something may be wrong with the design.

Hmm. Iterating over a range using a filtered_iterator on an underlying range of size n for which the predicate is true on m elements results in m + n + 2 comparisons of the underlying iterator and n calls to the predicate. A hand-coded loop for(; begin != end; ++begin) { if(f(*begin)) { // do stuff } } uses only n + 1 iterator comparisons. I'm not sure how often this is significant, but the only way I can see to avoid it is to combine the iterator increment and comparison into a single step. In Christ, Steven Watanabe

Giovanni Piero Deretta

6:53 p.m.

On Mon, Sep 1, 2008 at 8:37 PM, Steven Watanabe <watanabesj@gmail.com> wrote:

...

AMDG

David Abrahams wrote:

...
on Mon Sep 01 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...
Need to think about it. Could you give a practical example of an iterator adapter needing three iterators?

E.g., a bidirectional filtered_iterator needs to be prevented from stepping off both ends of the underlying data to avoid undefined behavior.

Do we really have to worry about stepping off the beginning? I mean, you aren't allowed to decrement the iterator at the beginning of a range, anyway, and if you decrement any other iterator, the predicate is guaranteed to stop it before it goes off the beginning.

That's right! It can only be one of three cases: or the range is empty and you can't move at all; or you are at the beginning and you can't go back anyway; or it exist at least one element before the current one such as the predicate is true. It's so obvious that I never though of it. <ashamed/> ... and in fact boost::filter_iterator only holds the current and end iterator.

...

...
On the other hand, you shouldn't have to pay for those endpoint comparisons all the time, which tells me that something may be wrong with the design.

Hmm. Iterating over a range using a filtered_iterator on an underlying range of size n for which the predicate is true on m elements results in m + n + 2 comparisons of the underlying iterator and n calls to the predicate.

A hand-coded loop

for(; begin != end; ++begin) { if(f(*begin)) { // do stuff } }

uses only n + 1 iterator comparisons.

I'm not sure how often this is significant, but the only way I can see to avoid it is to combine the iterator increment and comparison into a single step.

for filter_iteator at least, I guess that the badly predictable 'if' will dominate over everything else. -- gpd

David Abrahams

4:53 p.m.

on Mon Sep 01 2008, "Giovanni Piero Deretta" <gpderetta-AT-gmail.com> wrote:

...

...
...
You could provide an adapted_iterator and also an adapted_range.

My point is that the adapted_range would then have semantically equivalent iterators with a different representation (and an extra indirection), which seems like a waste.

hum, which indirection?

The iterators have to refer to the common data somehow.

...

<...>

...
...
Do you then still need a factored iterator?

You need to be able to take two adapted iterators and turn them into a range. Do you want that range to contain redundant data? I don't.

Hei, maybe this is all that we need! Let's have a metafunction that given an iterator type returns its preferred range type (the associated range). The default might be boost.range or even std::pair. The developer of an iterator adaptor, can specialize the metafunction so that the associate range is the optimum representation for an iterator pair.

Do you think this would be enough? It seems too simple to me, so probably I'm missing something...

It is quite attractive, I must say. I'll give it some thought. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Thorsten Ottosen

8:18 a.m.

Giovanni Piero Deretta skrev:

...

...
...
Stacking them naively would have 2^N storage growth, rather than 4^N, and we still don't need to worry about range lifetime. I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed.

I think that the OP is talking about a different problem: let say you want to name the result of stacked applications of range adaptors:

template<typename Range> void my_algo(Range& r) { auto stacked_range = r | filtered(filter) | mapped(mapper); // use stacked_range }

This may lead iterator lifetime problems, not because r iterators may be invalid, but because the iterators of the range returned by filtered(...) may become invalid as that temporary has been destructed.

I don't see how this happens. The iterators are stored by value in each new range that is generated. -Thorsten

Arno Schödl

8:38 a.m.

...

This may lead iterator lifetime problems, not because r iterators may be invalid, but because the iterators of the range returned by filtered(...) may become invalid as that temporary has been destructed.

...

I don't see how this happens. The iterators are stored by value in each new range that is generated.

Correct, but in the case of difference_range, storing iterators by value leads to 2^N storage bloat when stacking N difference_iterators. You can avoid this with factorable iterators (Dave) or by storing ranges by value and avoiding copying containers either by explicitly wrapping containers as "by ref" (Giovanni) or by range trait (me). -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

Giovanni Piero Deretta

10:19 a.m.

On Mon, Sep 1, 2008 at 10:18 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...

Giovanni Piero Deretta skrev:

...
...
...
Stacking them naively would have 2^N storage growth, rather than 4^N, and we still don't need to worry about range lifetime.

I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed.

I think that the OP is talking about a different problem: let say you want to name the result of stacked applications of range adaptors:

template<typename Range> void my_algo(Range& r) { auto stacked_range = r | filtered(filter) | mapped(mapper); // use stacked_range }

This may lead iterator lifetime problems, not because r iterators may be invalid, but because the iterators of the range returned by filtered(...) may become invalid as that temporary has been destructed.

I don't see how this happens. The iterators are stored by value in each new range that is generated.

But it doesn't need to be so. And that's exactly what all the discussion is about :) -- gpd

Arno Schödl

29 Aug 29 Aug

3:21 p.m.

David says:

...

I've been trying to say that generic algorithms *can't* worry about the range lifetime; they have to assume that the iterators they're passed will not be invalidated before they're destroyed. ... Bleah (sorry). Iterators in general shouldn't be burdened with the requirement to know the limits of the sequence, and if you go back and ... My advice: make sure you keep in mind the whole problem domain. You can't understand the reason that C++ iterators are unlike those in other languages without looking at the algorithms.

What you are saying about iterators is all true, and they work wonderfully. The problem that you acknowledged, which is exactly the problem that I have when designing difference_range, is that a _pair_ of iterators often has redundant information, which in the case of stacking difference_ranges leads either to storage bloat or to storage lifetime issues because we need a place to store this redundant information. The universal end iterator fixes this problem for forward ranges, but not for bidirectional or random access ranges. Giovanni says:

...

Anyways, the solution is simple: if you want your adaptor to survive the original wrapped range, you have >to store a copy of the wrapped range in the adaptor. As an added benefit, the adaptor size will be the >smallest possible for deep sequences of stacked adaptors.

This is a good solution, but RangeEx is not designed that way, and it is a big departure from regular containers. I am happy with it if we decide that this is the way to push for the next C++ standard:-) Arno -- Dr. Arno Schoedl · aschoedl@think-cell.com Technical Director think-cell Software GmbH · Invalidenstr. 34 · 10115 Berlin, Germany http://www.think-cell.com · phone +49-30-666473-10 · toll-free (US) +1-800-891-8091 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl · Amtsgericht Charlottenburg, HRB 85229

6140

Age (days ago)

6149

Last active (days ago)

List overview

Download

118 comments

7 participants

participants (7)

Arno Schödl
David Abrahams
Giovanni Piero Deretta
Michael Marcin
Neil Groves
Steven Watanabe
Thorsten Ottosen