Coroutines and Output Iterators

When one wants to generate some range of data, the usual approach is to write the data to some output iterator passed as an argument. However, that approach has a few drawbacks: it is necessary to provide a range (often of a non-known size) to write the data on, and it will be possible to consider that data only once all elements have been copied to the output. An alternative approach is returning a smart range which will generate the elements as it is being iterated. It is somewhat what the Boost.Range update in the review queue does. This approach has many advantages, since storage may not be needed and transformations can be combined on top, avoiding unnecessary iterations. Writing smart iterators/ranges that generate data, though, is much more difficult than writing generators using the previous solution, especially in recursive scenarios where the iterators will need to maintain a stack themselves. Another solution is coroutines, which allows to return data then reenter later in the same context and keep going. The results of a coroutine can then be iterated as they come, giving the advantages of smart iterator/ranges while keeping the syntax of output iterators. It is however at the cost of some stack allocation at the beginning (which can be improved using pools) and context switches at each iteration. What I propose is thus to make an output iterator that will yield the data within a coroutine context. With that solution, one writes generators as functions writing to output iterators, but those iterators might as well be writing to containers than actually yielding the result, which means the person designing the generator does not have the overhead of coroutines forced upon him. The syntax would be generate(boost::bind(&boost::transform, range, _1, f)) which would return a range similar to that of boost::make_transform_range(range, f). (boost::transform being just std::transform for ranges) The coroutine GSoC library from 2006 does offer that kind of thing, but not as non-intrusive output iterators. (I don't even get why it needs to call self.__self_exit__() at the end). I thus propose a simple output iterator that yields on iteration be made. By the way, what is the status of that library? I haven't heard of it for quite a while.

On Fri, Oct 10, 2008 at 2:50 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote: ...<snip>... Hi Mathias This is an interesting proposal, but I'm not quite sure I understand it correctly. I'm sure this is bread-and-butter stuff to hardcore computer scientist, but I'm not one! Does this proposal serve essentially the same need as lazy input iterator, and if so is it significant that your proposal seems to be driven from the generation side, rather than from the consumption side? Cheers. - Rob.

On Fri, Oct 10, 2008 at 3:50 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
Another solution is coroutines, which allows to return data then reenter later in the same context and keep going. The results of a coroutine can then be iterated as they come, giving the advantages of smart iterator/ranges while keeping the syntax of output iterators. It is however at the cost of some stack allocation at the beginning (which can be improved using pools) and context switches at each iteration.
Stack switching can be amortized by buffering more than one data for every context switch. The consumer can no longer control the exact amount of data produced by the producer but in many cases it doesn't matter.
What I propose is thus to make an output iterator that will yield the data within a coroutine context. With that solution, one writes generators as functions writing to output iterators, but those iterators might as well be writing to containers than actually yielding the result, which means the person designing the generator does not have the overhead of coroutines forced upon him.
<SNIP> In fact I have been planning for a long time to make the 'self' type be a functor. The interface to yield a value would change from the current: template<Self> my_result my_gen(Self& f) { for(....) self.yield(val); self.exit(); } to: template<F> void my_gen(F f) { for(...) f(val); } (note the switch from pass by reference to pass by value) Making an output iterator from 'f' is trivial via boost::function_output_iterator.
The coroutine GSoC library from 2006 does offer that kind of thing, but not as non-intrusive output iterators. (I don't even get why it needs to call self.__self_exit__() at the end).
For no reason: at that time I thought it would have been a good idea to require the generating function have a result type equal to the coroutine yield-type, so that I could statically check it. Unfortunately this means you have to use 'return' instead of 'yield' to yield the last value of the generation, or exit via an exception (generated by self.exit()). It was just stupid and I will to remove it. Btw, the same thing can be done with the current interface with a trivial wrapper around the generating function.
By the way, what is the status of that library? I haven't heard of it for quite a while.
Real life (and a job) got in the way. Progress has since been slow. -- gpd

What I propose is thus to make an output iterator that will yield the data within a coroutine context. With that solution, one writes generators as functions writing to output iterators, but those iterators might as well be writing to containers than actually yielding the result, which means the person designing the generator does not have the overhead of coroutines forced upon him.
<SNIP>
In fact I have been planning for a long time to make the 'self' type be a functor. The interface to yield a value would change from the current:
template<Self> my_result my_gen(Self& f) { for(....) self.yield(val); self.exit(); }
to:
template<F> void my_gen(F f) { for(...) f(val); }
(note the switch from pass by reference to pass by value) Making an output iterator from 'f' is trivial via boost::function_output_iterator.
This isn't such a good idea, IMHO. I'm using the GSoC version for quite some time now and this Self instance is a good place to store/access fiber specific data (similar to thread specific storage, but not has no thread affinity, i.e. is available regardless of which thread actually executes the fiber/coroutine). Just my 2€c. Regards Hartmut

On Sat, Oct 11, 2008 at 12:35 AM, Hartmut Kaiser <hartmut.kaiser@gmail.com> wrote:
What I propose is thus to make an output iterator that will yield the data within a coroutine context. With that solution, one writes generators as functions writing to output iterators, but those iterators might as well be writing to containers than actually yielding the result, which means the person designing the generator does not have the overhead of coroutines forced upon him.
<SNIP>
In fact I have been planning for a long time to make the 'self' type be a functor. The interface to yield a value would change from the current:
template<Self> my_result my_gen(Self& f) { for(....) self.yield(val); self.exit(); }
to:
template<F> void my_gen(F f) { for(...) f(val); }
(note the switch from pass by reference to pass by value) Making an output iterator from 'f' is trivial via boost::function_output_iterator.
This isn't such a good idea, IMHO. I'm using the GSoC version for quite some time now and this Self instance is a good place to store/access fiber specific data (similar to thread specific storage, but not has no thread affinity, i.e. is available regardless of which thread actually executes the fiber/coroutine).
Well, treating the self instance as a function object doesn't prevent other more ad hoc uses. Generic code that doesn't care just treats it as a function object, more specialized code that is coroutine aware can have access to other functionality. The change I have in mind is basically just changing self_type::yield to self_type::operator() (and supporting copy semantics). -- gpd

In fact I have been planning for a long time to make the 'self' type be a functor. The interface to yield a value would change from the current:
template<Self> my_result my_gen(Self& f) { for(....) self.yield(val); self.exit(); }
to:
template<F> void my_gen(F f) { for(...) f(val); }
(note the switch from pass by reference to pass by value) Making an output iterator from 'f' is trivial via boost::function_output_iterator.
This isn't such a good idea, IMHO. I'm using the GSoC version for quite some time now and this Self instance is a good place to store/access fiber specific data (similar to thread specific storage, but not has no thread affinity, i.e. is available regardless of which thread actually executes the fiber/coroutine).
Well, treating the self instance as a function object doesn't prevent other more ad hoc uses. Generic code that doesn't care just treats it as a function object, more specialized code that is coroutine aware can have access to other functionality. The change I have in mind is basically just changing self_type::yield to self_type::operator() (and supporting copy semantics).
Ohh, ok. That's fine. Regards Hartmut
participants (4)
-
Giovanni Piero Deretta
-
Hartmut Kaiser
-
Mathias Gaunard
-
Robert Jones