Re: [boost] [coroutine][review] Very Late Review

18 Sep 2012

      Am 17.09.2012 18:24, schrieb Giovanni Piero Deretta:
...
I like the idea that has been proposed during the review that the result of
a coroutine should be retrivable at any time via a get method.I dislike
the proposed optional<T> result for operator(). If the previous delayed
result functionality is implemented, operator() should just return *this,
so that the subsequent get() call can be chained or the result tested for
non empty. This change has the potential of deeply affecting the interface,
so I would really like to see it subject to a mini review before acceptance.
I would not add a coroutine<>::get() method because a coroutine is a 
'special function' (in the sense of multiple entry/leave). AFAIK 
boost::function also does not have a get() method.
...
The coroutine object has both the concept of a not-a-coroutine and of a
completed coroutine. This is confusing, and should be conflated to 'empty'
state, analogous to an empty std::function. In addition to simplifying the
documentation and the interface, it should also slightly improve the
implementation. Operator bool should return !empty as per std::function.
OK
...
It has been suggested that coroutine::self be removed an have a thread
specific object. I strongly disagree with this option. First of all it will
make it too easy to break type safety:
typedef this_coroutine<int(int)> this_coro;
typedef this_coroutine<void()> that_coro;
coroutine<int(int)>([]{
        that_coro::yield(); // oops compiles fine, UB at runtime.
});
that_coro::yield() could assert if the internal 'control block' 
(coroutine_base) is not
that of the current coroutine - of course at runtime.
...
In practice, one need to pass the signature to any code that wants to yield
anyway, so might as well pass the self object around, at least one can rely
on template type deduction.
some of the community members are not happy with tracking the self 
object trough the code
...
Also, one does not really want arbitrary code to yield out of the coroutine
as this can be dangerous (think of code between yield statements as a sort
of critical section) as invariants can be broken or mutices held.
you can not prevent user doing such stuff, IMHO
...
Instead I suggest doing away with a separate self type and instead making
the first parameter to the coroutine functor a full fledged coroutine, with
simmetric signature:
coroutine<int(void)> my_coroutine([](coroutine<void(int)> caller) {
               caller(10);
});
int i = my_coroutine();
I'm sure that users asking for a typedef in coroutine<int(void)> for the 
first parameter of the coroutine function
typedef coroutine< int(void) > coro_t;
typedef coro_t::caller_t  caller_t;

coro_t my_coroutine( []( caller_t caller) {
     caller( 10);
});

int i = my_coroutine();

in the case of the reviewed library the coro_t::caller_t from the 
example above is equivalent to
coro_t::self_t provided by the lib. The coro_t::self_t type is not a 
instance of the original coroutine, it is a new instance
of coroutine_self<> (not derived from a coroutine's base classes).
If I would rename coroutine_self<>::yield() to 
coroutine_self<>::operator() I get pretty much the same as you described 
in your example. Of course self_t is not a coroutine but it provides 
'coroutine' - semantics
(== context jumps).
...
I do not like the separation between coroutines and generators. Instead the
generator class should be dropped.
I disagree because a coroutine<> is a 'specialized' routine and a 
generator<> provides a sequence of elements. the requirement of 
coroutine<> is that its coroutine-fn must have return value (a last 
return statement in its body) and
a generator<> requires that the generator-fn returns always void (return 
values passed via self_t::yield()).
...
The reason that I had generator as a
separate class in the original coroutine SoC is because I wanted to give it
iterator semantics (this also required internally using a reference counted
pointer to make the iterator copiable, which I disliked).
a requirement of iterators is that its source/container is still valid 
during its live-time.
why not require the same for generators? the iterators of generator<> 
take only a reference to the associated generator object.
...
Instead I propose
to make every coroutine<T()> fulfill the the input range concept and every
coroutine<void(T)> fulfill the output range concept (this is a
functionality not covered by generators). In practice it means that for
these classes of coroutines, begin and end are defined returning
appropriate iterator (i.e. some very thin layers around the coroutine
itself). As a range is required to survive its iterator anyway, the
iterators can simply use plain pointers to the coroutine.
The last suggestion works very well if coupled with the suggestion of
making 'self' a full coroutine:
std::vector<int> a = { ... };
std::vector<int> b = { ... };
std::cout << "The result of 'merge(a,b) is:\n";
for(auto x : callcc([&](coroutine<void(int)> c) {  std::merge(a.begin(),
a.end(), b.begin(), b.end(), c.begin()); })
{
         std::cout << x << "\n";
}
I thought modelling generalized coroutines<T(T)> as ranges, but I have yet
to find a good model.
I've read the docu of boost.range - the requirements of a single pass 
range is:
- For any Range |a|, |[boost::begin(a),boost::end(a))| is a valid range, 
that is, |boost::end(a)| is reachable from |boost::begin(a)| in a finite 
number of increments.
    -> might be violated by an infinite loop inside the coroutine-fn

- . A Range provides iterators for accessing a half-open range 
|[first,one_past_last)| of elements and provides information about the 
number of elements in the Range.
    -> it is for a coroutine or a generator impossible to determine the 
number of elements

is it correct to apply the range concept to coroutine/generator?
...
The nest of base
classes required to handle all possible coroutine signatures is quite
overwhelming though and I think it could be significantly reduced. I was
able to trace through the execution paths I was interested into, but it
required quite a bit of back and forward between implementation files.
I used static polymorphism and curiously recurring template pattern in 
order to reduce the partial template specialisations (regarding to 
return type and parameter types of the coroutine-fn).
...
The coroutine object itself holds a reference counted pointer to the
implementation. As the coroutine is movable but not copyable, I do not
think that reference counting is needed.
The implementation uses a intrusive pointer - it has the footprint 
(memory) of a raw pointer and the functions
used to increment/decrement the ref-counter are inlined - you pay only 
of incrementing/decrementing an integer.
The reference count is required because I creating coroutine_self<> 
which gets the ref-counted ptr.
...
The coroutine itself should
consists of two raw pointers: a pointer to the target context and a pointer
to the result slot. You need this pointer anyway, because, if I understand
correctly, you are planning to allow retriving the result slot at any time
after the operator() and yield call.
no, I don't think not.
...
The first branch can be removed by moving the complexity to the
constructor: a redy-to-start context should be indistinguishable from a
suspended context.
I was thinking about how to remove the branch in the resume() member 
function - I'll try to implement your suggestion.
...
The branch on is_complete can be removed by making sure that a resume on
on-complete also pass the pointer to the return slot via the native_resume
result.
OK
comments?

regards,
Oliver

Re: [boost] [coroutine][review] Very Late Review

Oliver Kowalke