[thread-pool/futures] Explicit yielding

There has been some discussions about exposing a wait-callback in futures, a hook which is executed whenever someone calls wait. One of the motivations has been to efficiently re-use threads within a thread pool when a task is waiting on a future. The idea is to execute another task in the future::wait() call to avoid thread context switching and to need less worker threads. I've presented some arguments why this might cause unexpected problems. Here is a very crude draft of an alternative thread-pool interface which would allow thread re-use but in an explicit way. Let launch_in_pool come in two flavours, one without surprising thread re-use and one with explicit support for it. I hope the interfaces are self-explanatory. template<class R> future<R> launch_in_pool(function<R()> task); template<class R> future<R> launch_in_pool(function<R(thread_pool& pool)> task); /// The thread_pool class is only to be used by code executed by worker threads class thread_pool { template<class R, class F> future<R> submit_child_task(const F& child_task); enum yield_type {YIELD_TO_CHILD_TASK, YIELD_TO_RELATED_TASKS, YIELD_TO_ANY_TASK}; /// Waits until all child tasks are ready void yielding_wait(yield_type yt); }; Thoughts? Johan -- View this message in context: http://www.nabble.com/-thread-pool-futures--Explicit-yielding-tp17606225p176... Sent from the Boost - Dev mailing list archive at Nabble.com.

Johan Torp:
... The idea is to execute another task in the future::wait() call to avoid thread context switching and to need less worker threads. I've presented some arguments why this might cause unexpected problems.
It doesn't cause unexpected deadlocks if the thread is reused to execute the specific task that is supposed to produce the result of this particular future. Limiting the reuse to pool threads eliminates most other (theoretical IMO) problems.

Peter Dimov-5 wrote:
Johan Torp:
... The idea is to execute another task in the future::wait() call to avoid thread context switching and to need less worker threads. I've presented some arguments why this might cause unexpected problems.
It doesn't cause unexpected deadlocks if the thread is reused to execute the specific task that is supposed to produce the result of this particular future.
If I understand you correctly, it can deadlock. It can violate lock acquisition orders which may lead to dead locks. Assume we have two mutexes, M1 and M2 which is always supposed to acquired in order M1, M2. Thread A, task 1, pool task: submit task 2 acquire M2 wait on task 2 // Surprising that this call would acquire locks Thread A, task 2, executed from within wait on task 2 statement: acquire M1 // Acquired in wrong order Thread X: // Acquires mutexes in correct order acquire M1 acquire M2 I agree this might not be a very common case and that the added complexity might not be worth it. Holding a lock while waiting on a future seems stupid, but I believe having a lock acquisition order to avoid deadlocks is rather common. Johan -- View this message in context: http://www.nabble.com/-thread-pool-futures--Explicit-yielding-tp17606225p176... Sent from the Boost - Dev mailing list archive at Nabble.com.

Johan Torp:
If I understand you correctly, it can deadlock.
It can deadlock, but it cannot deadlock unexpectedly. That is, it will not turn code that didn't deadlock into code that deadlocks. Your example:
Thread A, task 1, pool task: submit task 2 acquire M2 wait on task 2 // Surprising that this call would acquire locks
Thread A, task 2, executed from within wait on task 2 statement: acquire M1 // Acquired in wrong order
Thread X: // Acquires mutexes in correct order acquire M1 acquire M2
already deadlocks without reuse.

Peter Dimov-5 wrote:
It can deadlock, but it cannot deadlock unexpectedly. That is, it will not turn code that didn't deadlock into code that deadlocks. Your example: -snip- already deadlocks without reuse.
You're right, I didn't see that. In a previous post you said "Limiting the reuse to pool threads eliminates most other (theoretical IMO) problems.". What theoretical problems has been identified with thread re-use? Does existing threadpools (such as java.util.concurrency.ExecutorService) offer thread re-use and if so is it considered an success? Johan -- View this message in context: http://www.nabble.com/-thread-pool-futures--Explicit-yielding-tp17606225p176... Sent from the Boost - Dev mailing list archive at Nabble.com.

Johan Torp:
In a previous post you said "Limiting the reuse to pool threads eliminates most other (theoretical IMO) problems.". What theoretical problems has been identified with thread re-use?
"State" leaking from one thread to another via thread-local variables, thread priority, or thread stack size.
Does existing threadpools (such as java.util.concurrency.ExecutorService) offer thread re-use and if so is it considered an success?
I don't see this in the documentation of ThreadPoolExecutor, so I'd say that it doesn't reuse threads in this way. It only reuses the current thread in 'submit' if the task would be otherwise rejected and the rejection policy is set to CallerRunsPolicy.

Peter Dimov-5 wrote:
In a previous post you said "Limiting the reuse to pool threads eliminates most other (theoretical IMO) problems.". What theoretical problems has been identified with thread re-use?
"State" leaking from one thread to another via thread-local variables, thread priority, or thread stack size.
Given that thread's stack size can be increased for worker threads, these problems doesn't seem to justify the complexity of the explicit yielding I proposed. It has another benefit though, you can implement "thread reuse" without exposing the future::set_wait_callback function. Johan -- View this message in context: http://www.nabble.com/-thread-pool-futures--Explicit-yielding-tp17606225p177... Sent from the Boost - Dev mailing list archive at Nabble.com.

Johan Torp:
Does existing threadpools (such as java.util.concurrency.ExecutorService) offer thread re-use and if so is it considered an success?
Doug Lea says that ThreadPoolExecutor doesn't steal threads, but the upcoming ForkJoin framework does. ForkJoin is described in http://gee.cs.oswego.edu/dl/papers/fj.pdf

For what would be 'task-stealing' be usefull? Wouldn't be one task-queue on which worker threads are waiting be sufficient? Polling of one thread in task-queues of other threads seams to be expensive. Oliver
Johan Torp:
Does existing threadpools (such as java.util.concurrency.ExecutorService) offer thread re-use and if so is it considered an success?
Doug Lea says that ThreadPoolExecutor doesn't steal threads, but the upcoming ForkJoin framework does. ForkJoin is described in
http://gee.cs.oswego.edu/dl/papers/fj.pdf
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hello, ----- Original Message ----- From: "Kowalke Oliver (QD IT PA AS)" <Oliver.Kowalke@qimonda.com> To: <boost@lists.boost.org> Sent: Monday, June 09, 2008 1:53 PM Subject: Re: [boost] [thread-pool/futures] Explicit yielding
For what would be 'task-stealing' be usefull? Wouldn't be one task-queue on which worker threads are waiting be sufficient?
It may be sufficient, but it can be ineficient.
Polling of one thread in task-queues of other threads seams to be expensive.
You are right, the main raison this is expensive is that we need to synchronize the access to this shared resource. But this is already the case when we have only one task-queue. I think that the quiz is that we need to manage the root-task(without parent) and the sub-task(with a parent task) differently. root-task must be handle in fifo order, but sub-tsaks are needed to complete other tasks, so they need to be handled in lifo order. What happens if we have only one queue? We need to synchronize the access to the single queue for every task, and as you say this is expensive. So we need to avoid to synchronize with other threads as much as possible. What happens if a task needs to wait on futures provided by other subtasks? We can make the worker thread wait directly on this future blocking one thread, or we can try to execute other tasks. As very often a task is waiting for futures provided by task launched by the task itself, it will be better that these subtasks run on the same worker thread (this avoid thread switching). In order to do that we need a stack of subtasks. But even in this case the worker thread would have is stack of task empty, and here he can just take a task from the queue of other threads
Johan Torp:
Does existing threadpools (such as java.util.concurrency.ExecutorService) offer thread re-use and if so is it considered an success?
Doug Lea says that ThreadPoolExecutor doesn't steal threads, but the upcoming ForkJoin framework does. ForkJoin is described in
http://gee.cs.oswego.edu/dl/papers/fj.pdf
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Please, forget my last message. It was sent unintentionally ... _____________________ Vicente Juan Botet Escriba ----- Original Message ----- From: "vicente.botet" <vicente.botet@wanadoo.fr> To: <boost@lists.boost.org> Sent: Monday, June 09, 2008 6:49 PM Subject: Re: [boost] [thread-pool/futures] Explicit yielding
Hello,
----- Original Message ----- From: "Kowalke Oliver (QD IT PA AS)" <Oliver.Kowalke@qimonda.com> To: <boost@lists.boost.org> Sent: Monday, June 09, 2008 1:53 PM Subject: Re: [boost] [thread-pool/futures] Explicit yielding
For what would be 'task-stealing' be usefull? Wouldn't be one task-queue on which worker threads are waiting be sufficient?
It may be sufficient, but it can be ineficient.
Polling of one thread in task-queues of other threads seams to be expensive.
You are right, the main raison this is expensive is that we need to synchronize the access to this shared resource. But this is already the case when we have only one task-queue.
I think that the quiz is that we need to manage the root-task(without parent) and the sub-task(with a parent task) differently. root-task must be handle in fifo order, but sub-tsaks are needed to complete other tasks, so they need to be handled in lifo order.
What happens if we have only one queue? We need to synchronize the access to the single queue for every task, and as you say this is expensive. So we need to avoid to synchronize with other threads as much as possible.
What happens if a task needs to wait on futures provided by other subtasks? We can make the worker thread wait directly on this future blocking one thread, or we can try to execute other tasks.
As very often a task is waiting for futures provided by task launched by the task itself, it will be better that these subtasks run on the same worker thread (this avoid thread switching). In order to do that we need a stack of subtasks. But even in this case the worker thread would have is stack of task empty, and here he can just take a task from the queue of other threads
Johan Torp:
Does existing threadpools (such as java.util.concurrency.ExecutorService) offer thread re-use and if so is it considered an success?
Doug Lea says that ThreadPoolExecutor doesn't steal threads, but the upcoming ForkJoin framework does. ForkJoin is described in
http://gee.cs.oswego.edu/dl/papers/fj.pdf
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hello, ----- Original Message ----- From: "Kowalke Oliver (QD IT PA AS)" <Oliver.Kowalke@qimonda.com> To: <boost@lists.boost.org> Sent: Monday, June 09, 2008 1:53 PM Subject: Re: [boost] [thread-pool/futures] Explicit yielding
For what would be 'task-stealing' be usefull? Wouldn't be one task-queue on which worker threads are waiting be sufficient?
It may be sufficient, but it can be ineficient.
Polling of one thread in task-queues of other threads seams to be expensive.
You are right, the main raison this is expensive is that we need to synchronize the access to this shared resource. But this is already the case when we have only one task-queue. I think that the quiz is that we need to manage the root-task (without parent) and the sub-task (with a parent task) differently. root-task must be handle in fifo order, but sub-tasks are needed to complete other tasks, so they need to be handled in lifo order. So a worker thread will take a root-task at a time in fifo order. If the task needs to wait for the result of another task, the worker thread will handle the tasks in the task-stack in lifo order, checking if the future is ready. When there are no more tasks in the task-stack and the future is not ready, the worker thread can either wait explicitly on the future or try to execute other tasks. We can either take a new root-task or a sub-task of another worker thread. But what we can do when there are no new root-tasks? we can take a sub-task of the other worker threads. Even if this is expensive this avoid to let a working thread in a waiting state. Only when there are no more task to do the working thread will wait on the future. In addition some optimizations can be applied becuase the owner of the task-stack use it on lifo order but the others use it on fifo order (so it needs to be a dequeue). I hope this will help you to understand for what would 'task-stealing' be usefull. Anyway a reading of the FJTask article http://gee.cs.oswego.edu/dl/papers/fj.pdf, or the tasks in Threading Building Blocks (TBB) http://www.threadingbuildingblocks.org/documentation.php will be worthwhile. Best _____________________ Vicente Juan Botet Escriba

Kowalke Oliver (QD IT PA AS) wrote:
For what would be 'task-stealing' be usefull? Wouldn't be one task-queue on which worker threads are waiting be sufficient? Polling of one thread in task-queues of other threads seams to be expensive. Oliver
Tasks can internally re-submit child tasks and then wait on the child task. This is probably a quite common usecase, see the paper. Also, somebody else might start working on your child task. Instead of waiting for them to complete you could start working on other worker threads' problems or top-level tasks. This can cause some problems though, such as unexpected deadlocks. Johan -- View this message in context: http://www.nabble.com/-thread-pool-futures--Explicit-yielding-tp17606225p177... Sent from the Boost - Dev mailing list archive at Nabble.com.

Tasks can internally re-submit child tasks and then wait on the child task. This is probably a quite common usecase, see the paper.
Also, somebody else might start working on your child task. Instead of waiting for them to complete you could start working on other worker threads' problems or top-level tasks. This can cause some problems though, such as unexpected deadlocks.
I did a look in the papers for for/join and work stealing algorithms. Is it correct that each fork of a subtask would create a new thread (which get joined later)? If so then this pattern introduces the overhead of thread creation and destruction again. Some platforms the number of threads pro process are limited (255 on Linux). I'm wondering why fork/join is so hip?! Oliver

Kowalke Oliver (QD IT PA AS) wrote:
Tasks can internally re-submit child tasks and then wait on the child task. This is probably a quite common usecase, see the paper.
Also, somebody else might start working on your child task. Instead of waiting for them to complete you could start working on other worker threads' problems or top-level tasks. This can cause some problems though, such as unexpected deadlocks.
I did a look in the papers for for/join and work stealing algorithms. Is it correct that each fork of a subtask would create a new thread (which get joined later)?
No, I don't think that would be the typical implementation. Rather you would have a thread pool of worker threads, somewhat correlated to the number of cores on the machines. The idea is that if one of the worker threads blocks on a task, it could do other work in it's wait() call. This is to prevent thread startup and thread context switching. If the language had support for coroutines/yielding or fibers there might be better ways of acheiving efficienct work <-> cores mapping. Johan -- View this message in context: http://www.nabble.com/-thread-pool-futures--Explicit-yielding-tp17606225p178... Sent from the Boost - Dev mailing list archive at Nabble.com.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 13 June 2008 05:03 am, Kowalke Oliver (QD IT PA AS) wrote:
thread creation and destruction again. Some platforms the number of threads pro process are limited (255 on Linux).
Linux isn't limited to 255 threads per process. However, on a 32 bit machine you will run out of memory address space somewhere on the order of 255 threads if you use the default thread stack size. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIUsWb5vihyNWuA4URAovAAKCagHp/7rLmeFH9acx7PEGtNKIDCwCgp/yI a9smtng77NU3U9B80dQQJ3M= =eYNA -----END PGP SIGNATURE-----

Frank Mori Hess wrote:
Linux isn't limited to 255 threads per process. However, on a 32 bit machine you will run out of memory address space somewhere on the order of 255 threads if you use the default thread stack size.
This reminds me that we don't have any way to influence the stack size for new threads in any of the thread libraries or proposals. Phil.
participants (6)
-
Frank Mori Hess
-
Johan Torp
-
Kowalke Oliver (QD IT PA AS)
-
Peter Dimov
-
Phil Endecott
-
vicente.botet