
on Wed Sep 03 2008, Arno Schödl <aschoedl-AT-think-cell.com> wrote:
rng | filtered( funcA ) | filtered( funcB ) | filtered( funcC ) | filtered( funcD ) | filtered( funcE )
1. _Do_ they write that?
Why not? They did for transform_range, and the notation is getting easier...
Yes, I never contested that they *could* write it. I'm still asking you the same question: what real life computations demand a solution to this problem?
2. If they do that, I believe they are still paying an enormous unnecessary runtime penalty even with your optimization because of the endpoint checks in each layer of the composite iterator or range. The only way IIUC to avoid such a penalty is to compress the filter adapters into one, essentially by using expression templates.
See answer to 4.
3. I am not yet convinced that you are suffering a significant performance penalty due to the wasted space in real applications. Have you measured it? Unless you're actually *storing* lots of these stacked iterators or passing them around by value in your inner loops, it seems unlikely to make much of a difference.
Giovanni has seen the quadratic stack expansions.
Sure, but that does not necessarily amount to a significant performance penalty.
If you are worried about performance of pointers, why not worry about the performance of wrappers around pointers?
I actually am a bit worried about it. However, as I am trying to tell you, I don't believe that what we're talking about here addresses the whole underlying problem. You're talking about eliminating redundancy across pairs of iterators, but AFAICT not the redundancy through the vertical "stack."
4. I can understand wanting to improve these components so that they can be efficient under more conditions, and avoiding the redundant storage would undoubtedly be a real improvement. However, I think there's a deeper inefficiency in stacking all those filter iterators together, and it's neither a range library's job, nor I think is it possible in this case, to protect users from "blissful unawareness" of performance implications.
The design we came up with addresses the general problem of stacking range_adaptors that must store their end, like difference_range/filter_range/union_range/[other algorithms that I cannot think of now], in any combination.
OK, let me try to spell this out. Let's suppose for a moment that you build a filter_iterator<strided_iterator<int*,5>, F>. A strided_iterator is one that visits every Nth element. Unless you know that there are an even multiple of N elements, it needs to store an endpoint because otherwise the iterator might stride right past the last stored location. Thus sizeof(strided_iterator<int*,5>) == 2*sizeof(int*) and its increment operator looks like: if (pos <= end - N) pos += N; else pos = end; Now as you know, a filter_iterator also needs to store an endpoint. So assuming F is empty and EBO is used to eliminate its storage, then sizeof(filter_iterator<strided_iterator<int*,5>, F>) == 2*sizeof(strided_iterator<int*,5>) == 2*2*sizeof(int*) and the filter iterator's increment looks like: do { ++pos2; } while (pos2 != end2 && !f(*pos2)); Now if we expand the incrementation of pos2, it's do { if (pos <= end - N) pos += N; else pos = end; } while (pos2 != end2 && !f(*pos2)); Whereas it should only be: do { if (pos <= end - N) pos += N; else pos = end; break; } while (!f(*pos2)); No approach that only addresses the space wasted across pairs of iterators can help with these issues, so I think a more fundamental look at the problem is called for. And again, I think the problem is very much like what happens with segmented iterators so its worth looking there for inspiration. It may be that, as with segmented iterators, a more general form for the algorithms is called for.
If you have thought up expression templates that cover all of these in a unified fashion with better performance than stacked iterators, I'd be very happy and try to implement that instead.
Again, I'm not ready for answers yet. I'm only just now discovering the questions.
I'm not yet convinced we're solving a real problem here,
I have enough of that stuff in my code that I am convinced we do.
That fact alone does not make it a real problem in my book.
and if it _is_ real, I am not quite convinced the general approach of squeezing out redundant storage would address all of the problem.
It does not, you already mentioned the superfluous end checks. As I said, if you have the whole solution, let me know.
I'm not there yet, but I know enough about the problem now that I wouldn't "buy" a partial solution. -- Dave Abrahams BoostPro Computing http://www.boostpro.com