[threadpool] new version - fork/join (recursive/fibers)

Hello, v13: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=boost-threadpool.v13.tar.gz&directory=Concurrent%20Programming& after the discussion yesterday I've modified the code. Now the pool can be configured (second template argument of tp::pool) to use recursive invokation of tasks or fibers in order to enable fork/join semantics. I've found that the recursive invokation is more than two times faster than using fibers (see examples recursive_fibonacci_recursive_task.cpp and recursive_fibonacci_with_fibers.cpp). The code using fibers sometimes raises an assertion in ~mutex() (used in work-stealing queue). For this reason and following the discussions I think I'll remove the support for fibers from threadpool. regards, OPliver

----- Original Message ----- From: <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Tuesday, November 04, 2008 7:58 PM Subject: [boost] [threadpool] new version - fork/join (recursive/fibers)
Hello,
after the discussion yesterday I've modified the code. Now the pool can be configured (second template argument of tp::pool) to use recursive invokation of tasks or fibers in order to enable fork/join semantics.
Yep. This is a good new.
I've found that the recursive invokation is more than two times faster than using fibers (see examples recursive_fibonacci_recursive_task.cpp and recursive_fibonacci_with_fibers.cpp).
This is not surprising.
The code using fibers sometimes raises an assertion in ~mutex() (used in work-stealing queue). For this reason and following the discussions I think I'll remove the support for fibers from threadpool.
Why not. I was not really attached to this solution. Thanks, Vicente

Am Dienstag, 4. November 2008 20:06:11 schrieb vicente.botet:
The code using fibers sometimes raises an assertion in ~mutex() (used in work-stealing queue). For this reason and following the discussions I think I'll remove the support for fibers from threadpool.
Why not. I was not really attached to this solution.
The usage of fibers in threadpool v12 did not provide any benefit realted to invoking tasks recursivly (it is slower because of its syste mcalls). I've enabled fiber migration to other worker-threads (stealing) inv v13. But I get runtime errors like discussed in the discussions in the mailing list (for instance pobs with locked mutexes in fibers). Maybe I've also some implementation bugs - the code using fibers is not stable. I'm open for discussion. regards, Oliver

On Tue, Nov 4, 2008 at 9:35 PM, Alexander Terekhov <terekhov@web.de> wrote:
k-oli@gmx.de wrote:
[... fibers ...]
I'm open for discussion.
First define 'fibers' in common sense language. This is a request for discussion to all. :-)
Here goes a shot at a definition: A fiber is a coperatively scheduled [purely/mostly userspace] thread of execution. AKA (symmetric) coroutines, green threads, co-threads, user contextes, etc. -- gpd

Giovanni Piero Deretta wrote:
On Tue, Nov 4, 2008 at 9:35 PM, Alexander Terekhov <terekhov@web.de> wrote:
k-oli@gmx.de wrote:
[... fibers ...]
I'm open for discussion.
First define 'fibers' in common sense language. This is a request for discussion to all. :-)
Here goes a shot at a definition: A fiber is a coperatively scheduled [purely/mostly userspace] thread of execution. AKA (symmetric) coroutines, green threads, co-threads, user contextes, etc.
"A fiber is a unit of execution that must be manually scheduled by the application [do you really want and/or capable of that?]. Fibers run in the context of the threads that schedule them [correct]. Each thread can schedule multiple fibers [correct]. In general, fibers do not provide advantages over a well-designed multithreaded application [that's right]. However, using fibers can make it easier to port applications that were designed to schedule their own threads [i.e. archaic academic coroutines apps]." Welcome back to the past. http://msdn.microsoft.com/en-us/library/ms682661(VS.85).aspx :-) regards, alexander.

----- Original Message ----- From: "Alexander Terekhov" <terekhov@web.de> To: <boost@lists.boost.org> Sent: Tuesday, November 04, 2008 11:12 PM Subject: Re: [boost] [threadpool] new version - fork/join (recursive/fibers)
Giovanni Piero Deretta wrote:
Here goes a shot at a definition: A fiber is a coperatively scheduled [purely/mostly userspace] thread of execution. AKA (symmetric) coroutines, green threads, co-threads, user contextes, etc.
"A fiber is a unit of execution that must be manually scheduled by the application [do you really want and/or capable of that?]. Fibers run in the context of the threads that schedule them [correct]. Each thread can schedule multiple fibers [correct]. In general, fibers do not provide advantages over a well-designed multithreaded application [that's right]. However, using fibers can make it easier to port applications that were designed to schedule their own threads [i.e. archaic academic coroutines apps]." Welcome back to the past. :-)
What do you have counter "archaic academic coroutines apps"? Is it for archaic, for academic or for corutines? ;-) Vicente

On Tue, Nov 4, 2008 at 11:12 PM, Alexander Terekhov <terekhov@web.de> wrote:
Giovanni Piero Deretta wrote:
On Tue, Nov 4, 2008 at 9:35 PM, Alexander Terekhov <terekhov@web.de> wrote:
k-oli@gmx.de wrote:
[... fibers ...]
I'm open for discussion.
First define 'fibers' in common sense language. This is a request for discussion to all. :-)
Here goes a shot at a definition: A fiber is a coperatively scheduled [purely/mostly userspace] thread of execution. AKA (symmetric) coroutines, green threads, co-threads, user contextes, etc.
"A fiber is a unit of execution that must be manually scheduled by the application [do you really want and/or capable of that?].
asio io_service does most of the work.
Fibers run in the context of the threads that schedule them [correct]. Each thread can schedule multiple fibers [correct]. In general, fibers do not provide advantages over a well-designed multithreaded application [that's right].
False. The reality is that fibers (and any type of user level threads) scale much better than threads. Hundreds of thousands of fibers based contextes are not unheard of in networked applications (think Erlang). Using threads for that is overkill. And you still have the advantage of cooperative scheduling (i.e. no mutual exclusion). In fact cooperative multitasking is often a better solution than threads to many problems.
However, using fibers can make it easier to port applications that were designed to schedule their own threads [i.e. archaic academic coroutines apps]." Welcome back to the past.
Actually most programming languages of the last 20 years (i.e. younger than C++) have fibers, or better, coroutines, in some form or another (heck, even C# has generators). Have you ever used a generator in python? That's a limited form of coroutines. -- gpd

hello, Using fibers in the threadpool in order to enable fork/join comes with some difficulties. My idea is to use recursive sub-task invokation as default and fiber only in some special cases. 1.) parent-task locks a mutex and sub-task has also to lock the mutex -> using recursive_mutex 2.) sub-task waits for signal of parent-task (condition variable) the recursive invokation of sub-task do not work. In this case I would suggest that the sub-task should be executed in a fiber. task< R >::get() - does recursive invokation of sub-task task< R >::get( new_context() ) - does execute the sub-task in a new fiber What do you think about this? regards, Oliver -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

----- Original Message ----- From: "Oliver Kowalke" <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Wednesday, November 05, 2008 7:58 AM Subject: [boost] [threadpool] mixture of recursive invokation and fibers?
hello,
Using fibers in the threadpool in order to enable fork/join comes with some difficulties.
My idea is to use recursive sub-task invokation as default and fiber only in some special cases.
1.) parent-task locks a mutex and sub-task has also to lock the mutex -> using recursive_mutex
2.) sub-task waits for signal of parent-task (condition variable) the recursive invokation of sub-task do not work. In this case I would suggest that the sub-task should be executed in a fiber.
task< R >::get() - does recursive invokation of sub-task task< R >::get( new_context() ) - does execute the sub-task in a new fiber
What do you think about this?
Hi Olivier, I would prefer to go on the direction of a single thread_pool which can be configured by user. Whitch will be the default configuration needs to be found. The user can enable/disable the fork/join using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { disable_fork_join fj_disabler; // here get blocks } } The user can enable/disable the fibers using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { enable_fibers fiber_enabler; // here get fibers switch context } // here get use recursive sub-task invokation } Maybe this approach has some pitfalls I have not explored yet. What do you think? Vicente

Hi Olivier,
I would prefer to go on the direction of a single thread_pool which can be configured by user. Whitch will be the default configuration needs to be found.
The user can enable/disable the fork/join using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { disable_fork_join fj_disabler; // here get blocks } }
The user can enable/disable the fibers using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { enable_fibers fiber_enabler; // here get fibers switch context } // here get use recursive sub-task invokation }
Maybe this approach has some pitfalls I have not explored yet. What do you think?
Vicente
You suggest to privde a mechanism similiar to boost::this_thread::disable_interruption and boost::this_thread::restore_interruption. I don't think we should not do this because some applications need more than one threadpool (piplined architectures for instance) so you will be forced to pass the pool instance to enable_fork_join etc. regards, Oliver -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

----- Original Message ----- From: "Oliver Kowalke" <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Wednesday, November 05, 2008 8:55 AM Subject: Re: [boost] [threadpool] mixture of recursive invokation and fibers?
Hi Olivier,
I would prefer to go on the direction of a single thread_pool which can be configured by user. Whitch will be the default configuration needs to be found.
The user can enable/disable the fork/join using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { disable_fork_join fj_disabler; // here get blocks } }
The user can enable/disable the fibers using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { enable_fibers fiber_enabler; // here get fibers switch context } // here get use recursive sub-task invokation }
Maybe this approach has some pitfalls I have not explored yet. What do you think?
Vicente
You suggest to privde a mechanism similiar to boost::this_thread::disable_interruption and boost::this_thread::restore_interruption. I don't think we should not do this because some applications need more than one threadpool (piplined architectures for instance) so you will be forced to pass the pool instance to enable_fork_join etc. _____________________ Yes, I'm suggesting that. Could you tell me more about when we need more that one pool (piplined architectures for instance)? Even if this is the case, this do not forbid to have enablers/disablers taking the pool as parameter. If at the end we think that it would be interesting to have a default thread pool we can have namespace the_thread_pool which defines enablers/disablers respect to this default thread pool. Vicente

Could you tell me more about when we need more that one pool (piplined architectures for instance)?
Pipelined architectures could be used in network-applications/services (for instance web-services). The idea is to parallelize the receiving, parsing, processing of network-messages in different stages. Each stage could have its own threadpool (the stages maybe also use only one pool). The pattern comes from the pipeline architecture of modern RISC processors. A better description can be found here (chapter 6): http://www.cs.uwaterloo.ca/~brecht/papers/getpaper.php?file=eurosys-2007.pdf
Even if this is the case, this do not forbid to have enablers/disablers taking the pool as parameter.
yes
If at the end we think that it would be interesting to have a default thread pool we can have namespace the_thread_pool which defines enablers/disablers respect to this default thread pool.
maybe we can hijack this_thread - a static threadpool with some default configuration (channel type etc.)? regards, Oliver -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

----- Original Message ----- From: "Oliver Kowalke" <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Wednesday, November 05, 2008 9:30 AM Subject: Re: [boost] [threadpool] mixture of recursive invokation and fibers?
Could you tell me more about when we need more that one pool (piplined architectures for instance)?
Pipelined architectures could be used in network-applications/services (for instance web-services). The idea is to parallelize the receiving, parsing, processing of network-messages in different stages. Each stage could have its own threadpool (the stages maybe also use only one pool). The pattern comes from the pipeline architecture of modern RISC processors. A better description can be found here (chapter 6): http://www.cs.uwaterloo.ca/~brecht/papers/getpaper.php?file=eurosys-2007.pdf
What is the advantage to have one pool for each stage respect one pool for all the stages?
Even if this is the case, this do not forbid to have enablers/disablers taking the pool as parameter.
yes
If at the end we think that it would be interesting to have a default thread pool we can have namespace the_thread_pool which defines enablers/disablers respect to this default thread pool.
maybe we can hijack this_thread - a static threadpool with some default configuration (channel type etc.)?
Great, Vicente

Could you tell me more about when we need more that one pool (piplined architectures for instance)?
Pipelined architectures could be used in network-applications/services (for instance web-services). The idea is to parallelize the receiving, parsing, processing of network-messages in different stages. Each stage could have its own threadpool (the stages maybe also use only one pool). The pattern comes from the pipeline architecture of modern RISC processors. A better description can be found here (chapter 6): http://www.cs.uwaterloo.ca/~brecht/papers/getpaper.php?file=eurosys-2007.pdf
What is the advantage to have one pool for each stage respect one pool for all the stages?
Imagine we have stage A which reads message from the network and stage B processing the messages read by A. If both use the same threadpool and mltiple clients send a large messages (ready will take some significant time) stage A coul possibly allocate all threads in the pool so not work-item in stage B can be executed. If both stages use different pools the processing of work-items is independent. B can executes its task even if stage A becomes a high load. regards, Oliver -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

----- Original Message ----- From: "Oliver Kowalke" <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Wednesday, November 05, 2008 10:04 AM Subject: Re: [boost] [threadpool] mixture of recursive invokation and fibers?
What is the advantage to have one pool for each stage respect one pool for all the stages?
Imagine we have stage A which reads message from the network and stage B processing the messages read by A. If both use the same threadpool and mltiple clients send a large messages (ready will take some significant time) stage A coul possibly allocate all threads in the pool so not work-item in stage B can be executed. If both stages use different pools the processing of work-items is independent. B can executes its task even if stage A becomes a high load.
Why the A would allocate all the the threads in the pool? If the stage A handle a message and the request B to do somthing the task of B will be enqueued before other new messages comming from the network, so B will be interleaved with A. What I'm missing? Best, Vicente

Imagine we have stage A which reads message from the network and stage B processing the messages read by A. If both use the same threadpool and mltiple clients send a large messages (ready will take some significant time) stage A coul possibly allocate all threads in the pool so not work-item in stage B can be executed. If both stages use different pools the processing of work-items is independent. B can executes its task even if stage A becomes a high load.
Why the A would allocate all the the threads in the pool? If the stage A handle a message and the request B to do somthing the task of B will be enqueued before other new messages comming from the network, so B will be interleaved with A. What I'm missing?
data-flow: network -> stage A -> stage B -> network If 10.000 clients are connected to the service and 1% of the clients send a large message then stage A is triggered to read/process 100 requests at the same time - possibly all threads of the pool are used by stage A and tasks of stage B are queued in the pool until a worker-thread are finished with items of A. regards, Oliver -- GMX Download-Spiele: Preizsturz! Alle Puzzle-Spiele Deluxe über 60% billiger. http://games.entertainment.gmx.net/de/entertainment/games/download/puzzle/in...

----- Original Message ----- From: "Oliver Kowalke" <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Wednesday, November 05, 2008 10:25 AM Subject: Re: [boost] [threadpool] mixture of recursive invokation and fibers?
Imagine we have stage A which reads message from the network and stage B processing the messages read by A. If both use the same threadpool and mltiple clients send a large messages (ready will take some significant time) stage A coul possibly allocate all threads in the pool so not work-item in stage B can be executed. If both stages use different pools the processing of work-items is independent. B can executes its task even if stage A becomes a high load.
Why the A would allocate all the the threads in the pool? If the stage A handle a message and the request B to do somthing the task of B will be enqueued before other new messages comming from the network, so B will be interleaved with A. What I'm missing?
data-flow: network -> stage A -> stage B -> network If 10.000 clients are connected to the service and 1% of the clients send a large message then stage A is triggered to read/process 100 requests at the same time - possibly all threads of the pool are used by stage A and tasks of stage B are queued in the pool until a worker-thread are finished with items of A. ___________ Maybe, I would like to see some benchmarks. Vicente

on Wed Nov 05 2008, "vicente.botet" <vicente.botet-AT-wanadoo.fr> wrote:
enqueued before other new messages comming from the network, so B will be interleaved with A. What I'm missing?
data-flow: network -> stage A -> stage B -> network
If 10.000 clients are connected to the service and 1% of the clients send a large message then stage A is triggered to read/process 100 requests at the same time - possibly all threads of the pool are used by stage A and tasks of stage B are queued in the pool until a worker-thread are finished with items of A.
___________
Maybe, I would like to see some benchmarks.
Vicente
Hi Vicente, It's very difficult to follow a thread when it isn't clear which text you wrote and which text you are quoting. Please see the section titled "Use a Readable Quotation Style" at http://www.boost.org/community/policy.html#effective Thanks, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Pipelining Pattern: http://cse.stanford.edu/class/sophomore-college/projects-00/risc/pipelining/ regards, Oliver -- "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ... Jetzt GMX ProMail testen: http://www.gmx.net/de/go/promail

Hi Olivier,
I would prefer to go on the direction of a single thread_pool which can be configured by user. Whitch will be the default configuration needs to be found.
The user can enable/disable the fork/join using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { disable_fork_join fj_disabler; // here get blocks } }
The user can enable/disable the fibers using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { enable_fibers fiber_enabler; // here get fibers switch context } // here get use recursive sub-task invokation }
Invoking this functionality should influence the whole threadpool or only the current worker-thread? Oliver -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

----- Original Message ----- From: "Oliver Kowalke" <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Wednesday, November 05, 2008 12:28 PM Subject: Re: [boost] [threadpool] mixture of recursive invokation and fibers?
Hi Olivier,
I would prefer to go on the direction of a single thread_pool which can be configured by user. Whitch will be the default configuration needs to be found.
The user can enable/disable the fork/join using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { disable_fork_join fj_disabler; // here get blocks } }
The user can enable/disable the fibers using something like { enable_fork_join fj_enabler; // here get use recursive sub-task invokation { enable_fibers fiber_enabler; // here get fibers switch context } // here get use recursive sub-task invokation }
Invoking this functionality should influence the whole threadpool or only the current worker-thread? _________________ IMO, we need to explore both possibilities and see which one is better. We can use namespace for the default threadpool the_thread_pool namespace for the whole threadpool configuration. this_working_thread to configure the behavior on the working thread. For specific pool we can have the same but adding as parameter the threadpool. In my opinion the scoped constructions do not work well to configure the threadpool, as we have no multiple threads that can configure the same object. I don't know if both can be configured at the same time. At least we have two variables to configure: task submision : where a task will be enqueued (pool queue or worker thread queue) blocking behaviour: what the scheduler does when we block e.g. on get (blocks, schedule recursilvely, use fibers, ...) Concerning the working thread in addition we could consider whether other working thread schedulers can steal tasks for this working thread queue. I'm not sure if we need all these variations, but it would be better to separate the concerns and see how this work. We need to see how all variables interacts and which ones are compatible and/or desirables. For example if the task is submited on the internal queue and the scheculer blocks on blocking functions these task will never run. Your first thread_pool uses the pool thread, don't have internal queue and don't steal from others working threads, and of course do not allow others to steeel because there is nothing to steal. Of course it blocks on blocking functions. The fjTask and TBB allows to submit on the pool thread or on the internal queue (parent task concept), schedules recursively when blocking and allows others working thread to steal tasks from its internal queue. Best, Vicente

Am Mittwoch, 5. November 2008 16:11:38 schrieb vicente.botet:
In my opinion the scoped constructions do not work well to configure the threadpool, as we have no multiple threads that can configure the same object.
noticed that in one implementation
I don't know if both can be configured at the same time.
At least we have two variables to configure: task submision : where a task will be enqueued (pool queue or worker thread queue) blocking behaviour: what the scheduler does when we block e.g. on get (blocks, schedule recursilvely, use fibers, ...)
Concerning the working thread in addition we could consider whether other working thread schedulers can steal tasks for this working thread queue.
I'm not sure if we need all these variations, but it would be better to separate the concerns and see how this work. We need to see how all variables interacts and which ones are compatible and/or desirables. For example if the task is submited on the internal queue and the scheculer blocks on blocking functions these task will never run.
Your first thread_pool uses the pool thread, don't have internal queue and don't steal from others working threads, and of course do not allow others to steeel because there is nothing to steal. Of course it blocks on blocking functions.
The fjTask and TBB allows to submit on the pool thread or on the internal queue (parent task concept), schedules recursively when blocking and allows others working thread to steal tasks from its internal queue.
My opinion: 1) work-stealing will be used when possible (can not disabled) 2) fork/join: recursivly executing arbitary task from the pool is wrong - only tasks-trees should be executed recursivly 3) forthe problem a sub-taks waits for an notication (condition variable) from its paren-task can be solved with fibers item 1) is implemented item 2) and 3) I#ve think about intreface and implementation regards, Oliver

Hi Vicente, please ignore my last post. I've uploaded a new version v14. First work-stealing will happen when possible - can not be disabled. I believe we don't need functions to enable/disable fork/join. Because fork/join is implicit if the code calls task< R >::get() in a worker thread (if blocking is desired I can provide task< R >::get( block() ) ). The sub-tasks are stored inside the local worker queue (work-stealing queue). The worker-queue works like a stack if the worker thread (owning the local queue) puts and takes sub-tasks from it (the stack represents the task-tree). Thatswhy we don't need such things like special tasks like TBB. The only open item is how to solve signaling between parent-task (sendig signal) and sub-task (waiting for signal) using a condition variable. I hope fibers can help. regards, Oliver
participants (6)
-
Alexander Terekhov
-
David Abrahams
-
Giovanni Piero Deretta
-
k-oli@gmx.de
-
Oliver Kowalke
-
vicente.botet