Cooperative Multi-Tasking

Daniel Larimer

3 Mar 2010 3 Mar '10

8:37 p.m.

I am curious whether anyone here has experience with any cross-platform cooperative multi-tasking libraries? I am familiar with the Boost.Coroutine library, but it only provides the "primitives" or "threads". What I am looking for is something much higher level that hides all of the coroutine details behind a asynchronous function call / future API. Something along the line of ASIO for coroutines. I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system. Considering that cooperative multitasking is a very old concept, are there any reasons why there are no obvious libraries / scheduling systems in place? Does everyone "roll their own" or do they avoid it for some other reason? Dan

Show replies by date

vicente.botet

3 Mar 3 Mar

9:13 p.m.

Hi, ----- Original Message ----- From: "Daniel Larimer" <dlarimer@gmail.com> To: <boost@lists.boost.org> Sent: Wednesday, March 03, 2010 9:37 PM Subject: [boost] Cooperative Multi-Tasking

...

I am curious whether anyone here has experience with any cross-platform cooperative multi-tasking libraries? I am familiar with the Boost.Coroutine library, but it only provides the "primitives" or "threads". What I am looking for is something much higher level that hides all of the coroutine details behind a asynchronous function call / future API. Something along the line of ASIO for coroutines.

I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

Considering that cooperative multitasking is a very old concept, are there any reasons why there are no obvious libraries / scheduling systems in place? Does everyone "roll their own" or do they avoid it for some other reason?

Oliver has in the review scheduler Boost.Fiber. I'm sure it will be pleasure for you to see at the LibraryUnderConstruction page https://svn.boost.org/trac/boost/wiki/LibrariesUnderConstruction HTH, Vicente

Daniel Larimer

4 Mar 4 Mar

12:58 p.m.

On Mar 3, 2010, at 4:13 PM, vicente.botet wrote:

...

Hi, ----- Original Message ----- From: "Daniel Larimer" <dlarimer@gmail.com> To: <boost@lists.boost.org> Sent: Wednesday, March 03, 2010 9:37 PM Subject: [boost] Cooperative Multi-Tasking

...
I am curious whether anyone here has experience with any cross-platform cooperative multi-tasking libraries? I am familiar with the Boost.Coroutine library, but it only provides the "primitives" or "threads". What I am looking for is something much higher level that hides all of the coroutine details behind a asynchronous function call / future API. Something along the line of ASIO for coroutines.

I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

Considering that cooperative multitasking is a very old concept, are there any reasons why there are no obvious libraries / scheduling systems in place? Does everyone "roll their own" or do they avoid it for some other reason?

Oliver has in the review scheduler Boost.Fiber.

I'm sure it will be pleasure for you to see at the LibraryUnderConstruction page https://svn.boost.org/trac/boost/wiki/LibrariesUnderConstruction

Is there any place that the documentation for all of the projects in boost/vault is pre-built? Having to setup and build a new library before even reviewing the documentation probably hinders adoption and without a publicly browsable set of documentation the library is "un-googleable"

...

HTH, Vicente

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Oliver Kowalke

7:09 p.m.

Daniel Larimer schrieb:

...

Is there any place that the documentation for all of the projects in boost/vault is pre-built? Having to setup and build a new library before even reviewing the documentation probably hinders adoption and without a publicly browsable set of documentation the library is "un-googleable"

extract the lib and look into directory boost.fiber-0.3.10/libs/fiber/doc/html/

Mathias Gaunard

9:20 a.m.

Daniel Larimer wrote:

...

I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

A significant advantage of not using coroutines or fibers with Asio is that you use as little memory per task as you need, while using a coroutine requires having a stack and other context data.

Rutger ter Borg

10:19 a.m.

Mathias Gaunard wrote:

...

A significant advantage of not using coroutines or fibers with Asio is that you use as little memory per task as you need, while using a coroutine requires having a stack and other context data.

Unless, of course, you use stack-less co-routines, e.g., see Asio documentation/Examples/HTTP Server 4. Cheers, Rutger

Giovanni Piero Deretta

10:50 a.m.

On Thu, Mar 4, 2010 at 9:20 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...

Daniel Larimer wrote:

...
I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

A significant advantage of not using coroutines or fibers with Asio is that you use as little memory per task as you need, while using a coroutine requires having a stack and other context data.

GCC supports (experimentally) split stacks (http://gcc.gnu.org/wiki/SplitStacks ) to let a thread dynamically grow and shrink its stack size (in principle with frame granularity). BTW, that project has been implemented to support goroutines in go. -- gpd

Daniel Larimer

12:56 p.m.

On Mar 4, 2010, at 5:50 AM, Giovanni Piero Deretta wrote:

...

On Thu, Mar 4, 2010 at 9:20 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...
Daniel Larimer wrote:

...
I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

A significant advantage of not using coroutines or fibers with Asio is that you use as little memory per task as you need, while using a coroutine requires having a stack and other context data.

GCC supports (experimentally) split stacks (http://gcc.gnu.org/wiki/SplitStacks ) to let a thread dynamically grow and shrink its stack size (in principle with frame granularity).

BTW, that project has been implemented to support goroutines in go.

My ultimate goal would be to achieve some what what go can do within C++. Unfortunately, go-like compile times will be impossible. Does anyone here know if Boost.Coroutine has variable (even compile time variable?) size stacks or what the default stack size of boost coroutine is? I would likely want to play with it. I remember reading someplace that on Windows you had no choice. It seems like it should be possible to implement a "smart stack allocation" scheme where new/delete are overloaded for some template type like sstack<my_data> v1; ... In theory you could get most of the speed advantages of stack-based allocation and yet offload "big objects" to "heap" without having to worry about expensive malloc/free calls except when the sstack<> detects that it needs to expand. Overhead would be a thread/coroutine specific data pointer to your dynamic stack, a conditional check on each sstack allocation and any performance gains that g++ would lose. There would also be an 8 byte "pointer" overhead which would cause functions to look like this: int fun() { struct _local{ int a, b, c; ... }; sstack<_local> local; local.a = ... } I am not sure how to estimate how much overhead something like this has compared to the benefits of using less ram. I have often felt that having the ability to define a "auto release heap" to handle many situations where new/delete is used for objects that have stack-based life time.

Rutger ter Borg

1 p.m.

Daniel Larimer wrote:

...

It seems like it should be possible to implement a "smart stack allocation" scheme where new/delete are overloaded for some template type like

sstack<my_data> v1; ... In theory you could get most of the speed advantages of stack-based allocation and yet offload "big objects" to "heap" without having to worry about expensive malloc/free calls except when the sstack<> detects that it needs to expand.

You are looking for Thorsten's autobuffer? http://www.cs.aau.dk/~nesotto/boost/auto_buffer.zip Cheers, Rutger

Giovanni Piero Deretta

1:11 p.m.

On Thu, Mar 4, 2010 at 12:56 PM, Daniel Larimer <dlarimer@gmail.com> wrote:

...

On Mar 4, 2010, at 5:50 AM, Giovanni Piero Deretta wrote:

...
On Thu, Mar 4, 2010 at 9:20 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...
Daniel Larimer wrote:

...
I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

A significant advantage of not using coroutines or fibers with Asio is that you use as little memory per task as you need, while using a coroutine requires having a stack and other context data.

GCC supports (experimentally) split stacks (http://gcc.gnu.org/wiki/SplitStacks ) to let a thread dynamically grow and shrink its stack size (in principle with frame granularity).

BTW, that project has been implemented to support goroutines in go.

My ultimate goal would be to achieve some what what go can do within C++. Unfortunately, go-like compile times will be impossible.

Does anyone here know if Boost.Coroutine has variable (even compile time variable?) size stacks or what the default stack size of boost coroutine is? I would likely want to play with it. I remember reading someplace that on Windows you had no choice.

you can select the coroutine stack size at runtime. The default is ridiculously low (1 page IIRC). On windows you can definitely chose the fiber stack size. What you cannot do is use your own stack allocator. -- gpd

Mathias Gaunard

2:34 p.m.

Daniel Larimer wrote:

...

Does anyone here know if Boost.Coroutine has variable (even compile time variable?) size stacks or what the default stack size of boost coroutine is? I would likely want to play with it. I remember reading someplace that on Windows you had no choice.

IIRC latest coroutine does its own context switching in assembly and only depends on the OS for memory allocation.

Daniel Larimer

7:21 p.m.

On Mar 4, 2010, at 9:34 AM, Mathias Gaunard wrote:

...

IIRC latest coroutine does its own context switching in assembly and only depends on the OS for memory allocation.

swapcontext.cpp does not appear to compile on Mac OS X which is a shame, does anyone know how to port it to Mac OS X? Dan

Ian Emmons

2:38 p.m.

On Mar 3, 2010, at 3:37 PM, Daniel Larimer wrote:

...

I am curious whether anyone here has experience with any cross-platform cooperative multi-tasking libraries? I am familiar with the Boost.Coroutine library, but it only provides the "primitives" or "threads". What I am looking for is something much higher level that hides all of the coroutine details behind a asynchronous function call / future API. Something along the line of ASIO for coroutines.

I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

Considering that cooperative multitasking is a very old concept, are there any reasons why there are no obvious libraries / scheduling systems in place? Does everyone "roll their own" or do they avoid it for some other reason?

Dan

My own experience (quite old now) with cooperative multitasking comes from Windows and Macintosh GUI programming back around 1990, before Windows and Macintosh had the modern concepts of process or thread. Your app consisted of a loop that fetched the next message/event and then acted on it. In that model, there was a single thread running all the applications on the system. It switched between applications by returning from the "get next message" system call in the message loop of whatever application it wanted to run next. Is that what you mean by "cooperative multitasking" here? If so, I would agree that it has some utility. However, it singularly fails to utilize multiple cores/processors, which is an important thing in today's world of static CPU speed and rapidly increasing core density. It also suffers from increased reliability problems, because it is harder in that model to isolate tasks from faults in other tasks.

Daniel Larimer

7:04 p.m.

On Mar 4, 2010, at 9:38 AM, Ian Emmons wrote:

...

On Mar 3, 2010, at 3:37 PM, Daniel Larimer wrote:

...
I am curious whether anyone here has experience with any cross-platform cooperative multi-tasking libraries? I am familiar with the Boost.Coroutine library, but it only provides the "primitives" or "threads". What I am looking for is something much higher level that hides all of the coroutine details behind a asynchronous function call / future API. Something along the line of ASIO for coroutines.

I am guessing that Boost.Asio would benefit from such a library. It seems obvious to me now that coroutines / cooperative multi-tasking is the superior approach to solving problems with a large number of actors and "locks" where heavy weight preemptive scheduling would bog down the system.

Considering that cooperative multitasking is a very old concept, are there any reasons why there are no obvious libraries / scheduling systems in place? Does everyone "roll their own" or do they avoid it for some other reason?

Dan

My own experience (quite old now) with cooperative multitasking comes from Windows and Macintosh GUI programming back around 1990, before Windows and Macintosh had the modern concepts of process or thread. Your app consisted of a loop that fetched the next message/event and then acted on it. In that model, there was a single thread running all the applications on the system. It switched between applications by returning from the "get next message" system call in the message loop of whatever application it wanted to run next.

Is that what you mean by "cooperative multitasking" here? If so, I would agree that it has some utility. However, it singularly fails to utilize multiple cores/processors, which is an important thing in today's world of static CPU speed and rapidly increasing core density. It also suffers from increased reliability problems, because it is harder in that model to isolate tasks from faults in other tasks._______________________________________________

Back when I was in middle school I was programming on Mac OS 7,8 and 9 and remember the constant reboots from infinite loops and my high school computer science class on Win 95/98 suffered the same fate! What I am trying to achieve is one real system thread per core and then distributing tasks (boost::function) among the cores. Each core would then use a cooperative system for handing asynchronous calls among actors. All of the ASIO stuff would happen in yet another real thread which would run the io_service. Boost.Asio implements a concept of a strand for tasks that must happen in a particular order. I was going to add to that the concept of a 'coop' (perhaps a better name???) for tasks which don't require order, but do require that they are not actually run by two preemptive threads at the same time. This would allow you to write actors that do not need to worry about locking. The problem with implementing actors with Asio is that the default "future" that "blocks" or waits ties up an entire OS thread when it is really unnecessary. Then I am going to implement the ability to bind futures to functions that are passed to the scheduler so that the method can automatically be invoked as soon as all of the parameters are ready. The plan is to only allocate a new coroutine when ever all existing coroutines are blocked and then to keep recycling the coroutines by passing them new functions to run. As it stands right now I have a working system that is almost entirely lock free (yet thread safe!) except when the task queue is empty and I need to wait for the next timer/event in which case I really want the process to sleep! There is still a lot of work to be done though. So in light of the above, I was hoping to get some insights into possible pitfalls and lessons learned. Talking with my employer, I am gaining support for eventually open sourcing a lot of this! As far as I have been able to google, I have not found any system that does what I describe above.

Oliver Kowalke

7:16 p.m.

Daniel Larimer schrieb:

...

Back when I was in middle school I was programming on Mac OS 7,8 and 9 and remember the constant reboots from infinite loops and my high school computer science class on Win 95/98 suffered the same fate!

What I am trying to achieve is one real system thread per core and then distributing tasks (boost::function) among the cores. Each core would then use a cooperative system for handing asynchronous calls among actors. All of the ASIO stuff would happen in yet another real thread which would run the io_service.

Boost.Asio implements a concept of a strand for tasks that must happen in a particular order. I was going to add to that the concept of a 'coop' (perhaps a better name???) for tasks which don't require order, but do require that they are not actually run by two preemptive threads at the same time. This would allow you to write actors that do not need to worry about locking.

boost.task does combine fibers/corroutines (using boost.fiber) with system threads. It prevents blocking a system thread by an task. It alows to pin a system thread to a core/cpu too.

Daniel Larimer

7:43 p.m.

On Mar 4, 2010, at 2:16 PM, Oliver Kowalke wrote:

...

Daniel Larimer schrieb:

...
Back when I was in middle school I was programming on Mac OS 7,8 and 9 and remember the constant reboots from infinite loops and my high school computer science class on Win 95/98 suffered the same fate! What I am trying to achieve is one real system thread per core and then distributing tasks (boost::function) among the cores. Each core would then use a cooperative system for handing asynchronous calls among actors. All of the ASIO stuff would happen in yet another real thread which would run the io_service. Boost.Asio implements a concept of a strand for tasks that must happen in a particular order. I was going to add to that the concept of a 'coop' (perhaps a better name???) for tasks which don't require order, but do require that they are not actually run by two preemptive threads at the same time. This would allow you to write actors that do not need to worry about locking.

boost.task does combine fibers/corroutines (using boost.fiber) with system threads. It prevents blocking a system thread by an task. It alows to pin a system thread to a core/cpu too.

Looking over the documentation it does seem to do a lot of the same things. I noticed that Mac OS X support has not been tested, perhaps I can do that. How does fiber compare to Boost.coroutine in terms of context-switching performance? It seems clear that fiber is a much higher level and with boost::task most of what I have described has been done, in theory. Is there some central repository of "reviews" or must I scan the forum logs? Dan

Oliver Kowalke

7:57 p.m.

Daniel Larimer schrieb:

...

Looking over the documentation it does seem to do a lot of the same things. I noticed that Mac OS X support has not been tested, perhaps I can do that. How does fiber compare to Boost.coroutine in terms of context-switching performance? It seems clear that fiber is a much higher level and with boost::task most of what I have described has been done, in theory. Is there some central repository of "reviews" or must I scan the forum logs?

Unfortunately I've no access to Mac OS -s o it wouldn't currently compile on Mac OS. I didn't test the preformance because I replace the current implementation of context switch (ucontext_t on UNIX and WIN 32 Fiber API) with assembler. If this is finished it should work much faster because it preserves only the required registers and does not save the signal mask (which involves an system call). By this way I can provide an implementatio nfor Mac OS too. I've an git repo which contains the boost-1.42 with additional libs - boost.atomic, boost.move, boost.fiber and boost.task. It is available from http://gitorious.org/boost-dev/boost-sandbox. Oliver

vicente.botet

9:32 p.m.

Hi, ----- Original Message ----- From: "Daniel Larimer" <dlarimer@gmail.com> To: <boost@lists.boost.org> Sent: Thursday, March 04, 2010 8:43 PM Subject: Re: [boost] Cooperative Multi-Tasking

...

On Mar 4, 2010, at 2:16 PM, Oliver Kowalke wrote:

...
Daniel Larimer schrieb:

...
Back when I was in middle school I was programming on Mac OS 7,8 and 9 and remember the constant reboots from infinite loops and my high school computer science class on Win 95/98 suffered the same fate! What I am trying to achieve is one real system thread per core and then distributing tasks (boost::function) among the cores. Each core would then use a cooperative system for handing asynchronous calls among actors. All of the ASIO stuff would happen in yet another real thread which would run the io_service. Boost.Asio implements a concept of a strand for tasks that must happen in a particular order. I was going to add to that the concept of a 'coop' (perhaps a better name???) for tasks which don't require order, but do require that they are not actually run by two preemptive threads at the same time. This would allow you to write actors that do not need to worry about locking.

boost.task does combine fibers/corroutines (using boost.fiber) with system threads. It prevents blocking a system thread by an task. It alows to pin a system thread to a core/cpu too.

Looking over the documentation it does seem to do a lot of the same things. I noticed that Mac OS X support has not been tested, perhaps I can do that.

How does fiber compare to Boost.coroutine in terms of context-switching performance? It seems clear that fiber is a much higher level and with boost::task most of what I have described has been done, in theory.

Is there some central repository of "reviews" or must I scan the forum logs?

There is no such a central repository of review. You can use http://old.nabble.com/Boost---Dev-f14201.html to query for task, fiber, threadpool. HTH, Vicente

Chris M. Thomasson

7 Mar 7 Mar

7:47 a.m.

"Daniel Larimer" <dlarimer@gmail.com> wrote in message news:A46E676B-59BA-4117-879E-7FE391255B21@gmail.com... [...]

...

Talking with my employer, I am gaining support for eventually open sourcing a lot of this! As far > as I have been able to google, I have not found any system that does what I describe above.

I take it that you are not familiar with Clik++: http://software.intel.com/en-us/articles/intel-cilk or QuickThreads: http://www.quickthreadprogramming.com ?

Thomas Klimpel

12:29 p.m.

Chris M. Thomasson wrote:

...

I take it that you are not familiar with Clik++:

http://software.intel.com/en-us/articles/intel-cilk

or QuickThreads:

http://www.quickthreadprogramming.com

Upon looking superficially at the material provided by your two links, I can't help but thinking that "Cilk++" plays in a totally different league than "QuickThreads". Whereas "Cilk++" manages to give the impression of a "mature" product (due to the availability of some well readable "marketing material"), "QuickThreads" even fails to tell me whether there are plans to support "Linux" in the foreseeable future. Well, I know that I just looked superficially, but since you posted the links, can you provide some more background information, especially about "QuickThreads"? ("Cilk++" manages to speak for itself, so more background information is probably not needed here.) Regards, Thomas

Daniel Larimer

8 Mar 8 Mar

2:51 a.m.

On Mar 7, 2010, at 2:47 AM, Chris M. Thomasson wrote:

...

"Daniel Larimer" <dlarimer@gmail.com> wrote in message news:A46E676B-59BA-4117-879E-7FE391255B21@gmail.com... [...]

...
Talking with my employer, I am gaining support for eventually open sourcing a lot of this! As far > as I have been able to google, I have not found any system that does what I describe above.

I take it that you are not familiar with Clik++:

http://software.intel.com/en-us/articles/intel-cilk

or QuickThreads:

http://www.quickthreadprogramming.com

Nope. And I searched just about everything I could think of to describe corporative multi-tasking and user threads. Found lots of stuff, but not those. The intel library looks interesting in that it is the most cross platform, where as quickthreadprogramming.com mentions Windows 7 but says nothing about linux/g++. As nice as those libraries are, they still fail to achieve my desired syntax: struct SomeServantClass { int width(); int height(); int calculate_area(int w, int h); }; IDL_STUB( SomeServantClass, IDL_BASE, METHOD(width), METHOD(height), METHOD(calculate_area) ) actor<SomeServantClass> a; future<int> w = a.width(); future<int> h = a.height(); future<int> area = a.calculate_area( w, h ); future<int> area34 = a.calculate_area( 3, 4 ); Where the tasks are all "automatic" and under the hood. SomeServantClass could be ANY existing class without modification. Also, area34 would be realized before area because w and h must be evaluated before the first calculate_area can be scheduled, where as the second can be scheduled immediately instead of weight on other futures. I suppose I could build my scheduler / active objects system upon one of those libraries. Boost.task looks the most promising from what I have seen. I guess for what I am doing, these higher level task abstractions attempt to do "too much" and I would essentially be using them as an implementation of coroutines. I don't want my users to be thinking about threads, locks, wait conditions, yielding, scheduling, etc. I want them to operate on a much higher level.

5624

Age (days ago)

5629

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

Chris M. Thomasson
Daniel Larimer
Giovanni Piero Deretta
Ian Emmons
Mathias Gaunard
Oliver Kowalke
Rutger ter Borg
Thomas Klimpel
vicente.botet