[SoC][Coroutine] Coroutine design

Hello boosters, As some of you know, I've been working on a coroutine library. As some have expressed interest in it, I've decided to participate to the google SoC, as a student, with this project (and boost as a mentor of course :)) I've not yet formally applied because I wanted first to write a detailed description of the project. After four sleepless nights I'm finally (mostly) done and ready to accept critics (let's hope no flames!). You can find the design at http://libstream.sf.net/index_c.html . Under the boost vault, in the concurrency folder there is already my first attempt at this problem (the package is continuation.tar.gz). It could be useful to evaluate the earlier attempt at the solution, that has driven me to write a more consistent design. Be warned that the code in that package does not compile (and is unix only). I will submit the project to Google later this week.. So, anyone interested? -- Giovanni P. Deretta

Giovanni P. Deretta wrote:
Hello boosters,
As some of you know, I've been working on a coroutine library. As some have expressed interest in it, I've decided to participate to the google SoC, as a student, with this project (and boost as a mentor of course :))
I've not yet formally applied because I wanted first to write a detailed description of the project. After four sleepless nights I'm finally (mostly) done and ready to accept critics (let's hope no flames!). You can find the design at http://libstream.sf.net/index_c.html .
Under the boost vault, in the concurrency folder there is already my first attempt at this problem (the package is continuation.tar.gz). It could be useful to evaluate the earlier attempt at the solution, that has driven me to write a more consistent design. Be warned that the code in that package does not compile (and is unix only).
I will submit the project to Google later this week..
So, anyone interested?
It does look interesting, I would encourage you not to wait too long before submitting your application. John.

It looks great. I still don't catch why there must be a "self" as the first parameter in a coroutine definition. Could the "current" function help here? If removed, the "yield" function would be much easier to use. Another question is, does a "coroutine" object act as both a coroutine instance and a coroutine factory? According to the examples it seems that sometimes a coroutine binds data when created, and sometimes a coroutine uses operator () to create a new instance, right? Xi On 5/6/06, Giovanni P. Deretta <gpderetta@gmail.com> wrote:
Hello boosters,
As some of you know, I've been working on a coroutine library. As some have expressed interest in it, I've decided to participate to the google SoC, as a student, with this project (and boost as a mentor of course :))
I've not yet formally applied because I wanted first to write a detailed description of the project. After four sleepless nights I'm finally (mostly) done and ready to accept critics (let's hope no flames!). You can find the design at http://libstream.sf.net/index_c.html .
Under the boost vault, in the concurrency folder there is already my first attempt at this problem (the package is continuation.tar.gz). It could be useful to evaluate the earlier attempt at the solution, that has driven me to write a more consistent design. Be warned that the code in that package does not compile (and is unix only).
I will submit the project to Google later this week..
So, anyone interested? -- Giovanni P. Deretta _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Xi Wang wrote:
It looks great. I still don't catch why there must be a "self" as the first parameter in a coroutine definition. Could the "current" function help here? If removed, the "yield" function would be much easier to use.
The coroutine return type is statically defined. Thus yield (that is both a form of return and call) must know the return type and the argument types. The only way to statically type check these types is to make yield a member of the coroutine or take the coroutine as a parameter. In both cases the coroutine needs a pointer to itself. A current coroutine global pointer would necessarily erase all type informations, and a global yield function that uses this pointer would need to delay all checking until runtime. This would both slow down yielding and prevent the compiler from checking error a compile time. Note that in the "Other Issues" section there is a mention of a free yield function that uses the current_coroutine pointer. But this wouldn't be the prefered interface, if it is implemented at all.
Another question is, does a "coroutine" object act as both a coroutine instance and a coroutine factory? According to the examples it seems that sometimes a coroutine binds data when created, and sometimes a coroutine uses operator () to create a new instance, right?
Probably I should mark more clearly the pseudocode examples where I make liberal use of the coroutine keyword both to create new instances and to mark a coroutine body) and actual C++ code. In C++, a coroutine object *always* represent a single instance. The constructor binds the coroutine with the function object that implement the body. coroutine::operator(...) resumes (or start, if not already started) the specific coroutine instance to which is applied. Any way, I will modify the pseudocode examples to conform to a single interface. Thanks for the feedback. -- Giovanni P. Deretta

On 5/6/06, Giovanni P. Deretta <gpderetta@gmail.com> wrote:
Any way, I will modify the pseudocode examples to conform to a single interface.
Done. I've slightly modified a couple of pseudocode examples to make a more consistent use of the coroutine keyword and visually diferentiated actual C++ code from pseudocode. -- Giovanni P. Deretta

On 5/6/06, Giovanni P. Deretta <gpderetta@gmail.com > wrote:
Xi Wang wrote:
It looks great. I still don't catch why there must be a "self" as the first parameter in a coroutine definition. Could the "current" function help here? If removed, the "yield" function would be much easier to use.
The coroutine return type is statically defined. Thus yield (that is both a form of return and call) must know the return type and the argument types. The only way to statically type check these types is to make yield a member of the coroutine or take the coroutine as a parameter. In both cases the coroutine needs a pointer to itself. A current coroutine global pointer would necessarily erase all type informations, and a global yield function that uses this pointer would need to delay all checking until runtime. This would both slow down yielding and prevent the compiler from checking error a compile time. Note that in the "Other Issues" section there is a mention of a free yield function that uses the current_coroutine pointer. But this wouldn't be the prefered interface, if it is implemented at all.
I see. A free yield may be not so bad, for most operating systems support retrieving current coroutine:-) Besides, what is the return type of "yield"? In the document I see tie(parm_1, parm_2,... parm_n) = self.*yield*(result_1, result_2,... result_n); I guess it should just return an integer or something else, not a real tuple, right? If the point is to check the return type, maybe we can trick the compiler like this: int __yield(T); #define yield(val) if (__yield(val) < 0) return val; If the return value of __yield should not be less than 0, the compiler would check the type of "val" automatically.
Another question is, does a "coroutine" object act as both a coroutine instance and a coroutine factory? According to the examples it seems that sometimes a coroutine binds data when created, and sometimes a coroutine uses operator () to create a new instance, right?
Probably I should mark more clearly the pseudocode examples where I make liberal use of the coroutine keyword both to create new instances and to mark a coroutine body) and actual C++ code.
In C++, a coroutine object *always* represent a single instance. The constructor binds the coroutine with the function object that implement the body. coroutine::operator(...) resumes (or start, if not already started) the specific coroutine instance to which is applied.
What confused me is that coroutine::operator(...) takes parameters. Does this mean the parameters of a coroutine can be bound at either creation time or invocation time or both? It seems that coroutine::operator () mixes the responsibilities of both binding and resuming, which should be separated, in my opinion. Any way, I will modify the pseudocode examples to conform to a single
interface.
Thanks for the feedback.
-- Giovanni P. Deretta _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Xi Wang wrote:
On 5/6/06, Giovanni P. Deretta <gpderetta@gmail.com > wrote:
Xi Wang wrote:
It looks great. I still don't catch why there must be a "self" as the first parameter in a coroutine definition. Could the "current" function help here? If removed, the "yield" function would be much easier to use.
The coroutine return type is statically defined. Thus yield (that is both a form of return and call) must know the return type and the argument types. The only way to statically type check these types is to make yield a member of the coroutine or take the coroutine as a parameter. In both cases the coroutine needs a pointer to itself. A current coroutine global pointer would necessarily erase all type informations, and a global yield function that uses this pointer would need to delay all checking until runtime. This would both slow down yielding and prevent the compiler from checking error a compile time. Note that in the "Other Issues" section there is a mention of a free yield function that uses the current_coroutine pointer. But this wouldn't be the prefered interface, if it is implemented at all.
I see. A free yield may be not so bad, for most operating systems support retrieving current coroutine:-) Besides, what is the return type of "yield"? In the document I see tie(parm_1, parm_2,... parm_n) = self.*yield*(result_1, result_2,... result_n); I guess it should just return an integer or something else, not a real tuple, right?
No, yield returns an actual tuple. The element types of the tuple are the same types of the function arguments. For example: tuple<a_t, b_t, c_t> some_function (corutine<tuple<a_t, b_t, c_t>(d_t, e_t, f_t)>& self, d_t d, e_t e, f_t f) { // 1 a_t a; b_t b; c_t c; while(true) { do_something_with(d, e, f); a = some_value; b = some_other_value; d = yet_some_other_value; tie(d, e, f) = self.yield (a, b, d); // 2 } } int main() { coroutine<corutine<tuple<a_t, b_t, c_t>(d_t, e_t, f_t)> my_coroutine(some_function); a_t a; b_t b; c_t c; d_t d = ...; e_t e = ...; f_t f = ...; tie(a, b, c) = my_coroutine(d, e, f); // 3 do_something with a, b, c, d, e, f; tie(a, b, c) = my_coroutine(d, e, f); // 4 } The first time a coroutine is entered (from 3), execution start at the main entry point (1). Each tuple can have an arbitrary number of arguments and results (using tuples), but their type and number is fixed at compile time. At the yield point control is relinquished to the caller. The argument values of yield will be returned to the caller as if the coroutine function had returned. (Note that yield is special cased for the tuple case and one is not required to write self.yield(make_tuple(...))). Yield is *not* the same thing as return because the current scope is not exited, stack is not unwound and thus scoped objects are not destroyed. This allows us to reenter the scope. Next time the coroutine is invoked (note that for the caller there is *no* syntax distinction from first time invocation and successive ones), control is resumed at the yield point. Yield gives to the coroutine the values passed as parameter by the caller. As the parameter types and numbers are the same of the main entry point (i.e. the function signature), yield must know it (or be checked at run time). I'm a strong believer of static type checking and I think that coroutines should be statically checked. Of course I see the usefulness of postponing checks until runtime. a coroutine<boost::any(boost::any)> would do it, and the library could have special support for it. But the preferred interface would be statically checked.
If the point is to check the return type, maybe we can trick the compiler like this: int __yield(T); #define yield(val) if (__yield(val) < 0) return val; If the return value of __yield should not be less than 0, the compiler would check the type of "val" automatically.
Apart for my general dislike for macros (except when you can't really do without :) ), this wouldn't work because return would exit the scope and cause all locals to be destroyed. It is possible to implement a kind of pseudo-coroutines using a variation of the duff device (see http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html), where the yield is actually substituted by return, but they are severely limited. All your locals become static, thus are thread unsafe; it uses unsafe macros; the syntax is not natural as it requires special macros; and most importantly you can't have multiple instances of the same coroutine.
Another question is, does a "coroutine" object act as both a coroutine instance and a coroutine factory? According to the examples it seems that sometimes a coroutine binds data when created, and sometimes a coroutine uses operator () to create a new instance, right?
Probably I should mark more clearly the pseudocode examples where I make liberal use of the coroutine keyword both to create new instances and to mark a coroutine body) and actual C++ code.
In C++, a coroutine object *always* represent a single instance. The constructor binds the coroutine with the function object that implement the body. coroutine::operator(...) resumes (or start, if not already started) the specific coroutine instance to which is applied.
What confused me is that coroutine::operator(...) takes parameters. Does this mean the parameters of a coroutine can be bound at either creation time or invocation time or both? It seems that coroutine::operator () mixes the responsibilities of both binding and resuming, which should be separated, in my opinion.
coroutine::operator() always invokes a coroutine. Binding is done only at creation time (in the constructor). I thought the article was clear. If you could point out where the confusion is, I will try to make it more clear. Also consider that in pseudocode 'coroutine' is not an object but a keyword that introduces (possibly lambda) a coroutine body. In the pseudocode every 'coroutine' body is a diferent instance. Should I make this more clear?. -- Giovanni P. Deretta

On 5/6/06, Giovanni P. Deretta <gpderetta@gmail.com> wrote:
No, yield returns an actual tuple. The element types of the tuple are the same types of the function arguments. For example:
tuple<a_t, b_t, c_t> some_function (corutine<tuple<a_t, b_t, c_t>(d_t, e_t, f_t)>& self, d_t d, e_t e, f_t f) { // 1 a_t a; b_t b; c_t c; while(true) { do_something_with(d, e, f); a = some_value; b = some_other_value; d = yet_some_other_value; tie(d, e, f) = self.yield (a, b, d); // 2 } }
int main() {
coroutine<corutine<tuple<a_t, b_t, c_t>(d_t, e_t, f_t)> my_coroutine(some_function);
a_t a; b_t b; c_t c; d_t d = ...; e_t e = ...; f_t f = ...;
tie(a, b, c) = my_coroutine(d, e, f); // 3 do_something with a, b, c, d, e, f; tie(a, b, c) = my_coroutine(d, e, f); // 4
}
The first time a coroutine is entered (from 3), execution start at the main entry point (1). Each tuple can have an arbitrary number of arguments and results (using tuples), but their type and number is fixed at compile time. At the yield point control is relinquished to the caller. The argument values of yield will be returned to the caller as if the coroutine function had returned. (Note that yield is special cased for the tuple case and one is not required to write self.yield(make_tuple(...))). Yield is *not* the same thing as return because the current scope is not exited, stack is not unwound and thus scoped objects are not destroyed. This allows us to reenter the scope.
Next time the coroutine is invoked (note that for the caller there is *no* syntax distinction from first time invocation and successive ones), control is resumed at the yield point. Yield gives to the coroutine the values passed as parameter by the caller. As the parameter types and numbers are the same of the main entry point (i.e. the function signature), yield must know it (or be checked at run time).
Got it. Thanks a lot for the detailed explanation :-) So the return value (tuple) of a call to a yield in the coroutine body is the parameters of the subsequent invocation to the coroutine object, right? I was thinking it would return nothing. I'm a strong believer of static type checking and I think that
coroutines should be statically checked. Of course I see the usefulness of postponing checks until runtime. a coroutine<boost::any(boost::any)> would do it, and the library could have special support for it. But the preferred interface would be statically checked.
If the point is to check the return type, maybe we can trick the compiler like this: int __yield(T); #define yield(val) if (__yield(val) < 0) return val; If the return value of __yield should not be less than 0, the compiler would check the type of "val" automatically.
Apart for my general dislike for macros (except when you can't really do without :) ), this wouldn't work because return would exit the scope and cause all locals to be destroyed.
It is possible to implement a kind of pseudo-coroutines using a variation of the duff device (see http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html), where the yield is actually substituted by return, but they are severely limited. All your locals become static, thus are thread unsafe; it uses unsafe macros; the syntax is not natural as it requires special macros; and most importantly you can't have multiple instances of the same coroutine.
I dislike macros, too:-) The "return" I mentioned was just to guide the compiler to check the return type rather than to exit the function (of course it should not). More clearly, suppose real_yield is a global yield function, not type-safe. #define yield(x) \ if (FALSE) return x; \ real_yield(x); This macro would help to check the return type, and "return x" is never called. However, in this case, the return value (tuple) of real_yield could not be retrieved. So just forget it:-)
coroutine::operator() always invokes a coroutine. Binding is done only at creation time (in the constructor). I thought the article was clear. If you could point out where the confusion is, I will try to make it more clear.
Also consider that in pseudocode 'coroutine' is not an object but a keyword that introduces (possibly lambda) a coroutine body. In the pseudocode every 'coroutine' body is a diferent instance. Should I make this more clear?.
I got it. Thanks.

Xi Wang wrote:
[ some explaination here ] Got it. Thanks a lot for the detailed explanation :-) So the return value (tuple) of a call to a yield in the coroutine body is
On 5/6/06, Giovanni P. Deretta <gpderetta@gmail.com> wrote: the parameters of the subsequent invocation to the coroutine object, right?
Exactly.
[...] The "return" I mentioned was just to guide the compiler to check the return type rather than to exit the function (of course it should not). More clearly, suppose real_yield is a global yield function, not type-safe. #define yield(x) \ if (FALSE) return x; \ real_yield(x); This macro would help to check the return type, and "return x" is never called. However, in this case, the return value (tuple) of real_yield could not be retrieved. So just forget it:-)
Also this wouldn't work if yield is invoked from a subroutine called from a coroutine. Still the trick is interesting. I didn't understand it the first time you proposed it. Btw, IF the coroutine body is actually a function object AND there is only one operator() for that object AND yield is only called from operator() (and not from any other subroutine), may be something like this could work (i did not try it): #define yield(x) \ real_yield(&this->operator(), x) (real_yeld is a global template function that will chose its arguments and return type according to the signature of the first function parameter that is passed to it) If operator() is overloaded the compiler will complain that address_of call is ambiguous. If yield is called from a free function it will complain that 'this' cannot be found. If yield is called from a functor called from the coroutine body, things can go very very badly. This macro can break in so many ways that it is not fun... Thanks again for the feedback. -- Giovanni P. Deretta
participants (4)
-
Giovanni P. Deretta
-
Giovanni Piero Deretta
-
John Maddock
-
Xi Wang