On 29/08/2010 00:53, Dave Abrahams wrote:
I presume you're familiar with http://lafstern.org/matt/segmented.pdf?
I had heard of segmented iterators but hadn't looked at them in detail, so thanks for the link. I'm not sure I want something that's just two-level
Read the paper. It works for as many levels as you want. The localiterator can itself be segmented.
I see some potential for some interesting discussion here, sorry in advance for hijacking this thread. I think the whole segmented iterators/ranges is a bad idea, as it has quite a few disadvantages and there are alternative, simpler solutions that provide the same benefits. segmented ranges have the following disadvantages: - while segmented iterators are easier to write than flat iterators in certain cases, they're still not as easy to write as they could be - any code or algorithm that deals with regular ranges has to be rewritten, not just in order to gain the performance gains, but to get basic functionality working as well. That's about as bad as it gets from an integration point of view. - only trivial operations that manipulate element per element (for_each, copy, transform, accumulate, erase_if) can really exploit the performance gains of segmented ranges - making non-trivial operations (any operation that needs to manipulate several adjacent elements [1]) deal with segmented ranges can be so complicated that they can realistically only be written with a flattened non-segmented view of the range. [1] examples include character encoding conversion, base64 encoding/decoding, gzip encoding/decoding, etc.
and still based on iterators, though; one case I was thinking of is compressed data that truly has a linear sequence but it might be faster to have the decompressor control the program's control flow rather than having the algorithm do it (of course, that isn't as general as iterators, but works for some algorithms).
Ah, push or pull—the eternal struggle.
Push and pull are the opposite ends of the same thing. When you want to generate data, you'd rather push rather than be pulled, when you want to read it, you'd rather pull than be pushed. So what's the solution? Generators. An by that, I mean a function that writes data to an output iterator. A generator is a pure-push primitive, and is trivial to write efficiently. Trivial operations can be implemented with no cost on top of a generator, by using something such as boost::function_output_iterator. There are generic techniques to build a (normal, non-segmented) range that you can then easily pull from on top of a generator, but those are more complicated and introduce some overhead; overhead you would need anyway to do any non-trivial operation. There are two techniques to achieve this. The first one, the magical silver bullet, involves using a coroutine. Every time you generate a value, yield. This gives control back to the other end, that can do whatever it wants with the element, and when it pulls again it goes back in the generator exactly where it was, and can keep on pushing. The code keeps swapping between a pushing and a pulling context and achieves exactly the push->pull integration we wanted. It however raises a few concerns: context size, allocation and management, and potentially excessive context swapping. The other technique is to generate data step by step, each step having a fixed maximum size. Then you can run a step on a temporary buffer, read off that temporary buffer, then run a new step once everything has been read. An iterator that does this transparently on-top of a generator can be written. Albeit it does introduce some overhead, it could be less than that of the coroutine solution. The biggest concern is that it makes the iterator fat, since the temporary buffer is part of the iterator; and iterators are typically passed by value. The latter approach is the one I used to implement my generic Converter system as part of my Unicode library. A converter is a step-generator, except it takes input to convert as well. I've got an iterator adaptor that can then run a converter as the data is being iterated, and thus lazily perform any arbitrary conversion. Note that the end result, in the case of Unicode, in not very different from the Unicode iterator adaptors hand-written by John Maddock, except the implementation is much simpler, and can be easily used for efficient pushing as well. So, at the end of this long message, what do I propose to solve the original problem? Have your container provide a step-generator. You can then implement your iterator on top of it, or not, your choice (but I'd recommend using it to avoid unnecessary code duplication and maintenance). A type trait could then be introduced to tell whether a sequence provides a generator or not. Boost.Range could make use of that type trait to specialize trivial operations to use the generator instead of the iterators. We end up with: - an easy to write solution - compatibility will all code that takes ranges as input - only the operations that want to can be specialized to exploit the performance gains - no restriction imposed on operations that cannot be implemented in terms of the generator directly