[threadpool] relation with TR2 proposal

joaquin＠tid.es

12 Sep 2008 12 Sep '08

8:52 a.m.

Hello, I learnt yesterday that there is a TR2 proposal for a thread_pool, sponsored by our Boost colleague Anthony Williams: http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list? Thank you, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Show replies by date

k-oli＠gmx.de

12 Sep 12 Sep

10:34 a.m.

Am Freitag, 12. September 2008 10:52:50 schrieb joaquin@tid.es:

...

Hello,

I learnt yesterday that there is a TR2 proposal for a thread_pool, sponsored by our Boost colleague Anthony Williams:

http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

Thank you,

Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

The threadpool lib in the vault has no relation to the TR2 proposal. It seams that Anthonys lib is simple - no support for task interruption, lazy task evaluation or task chanining, channel and queue types etc. Oliver

joaquin＠tid.es

11:24 a.m.

k-oli@gmx.de escribió:

...

Am Freitag, 12. September 2008 10:52:50 schrieb joaquin@tid.es:

...
Hello,

I learnt yesterday that there is a TR2 proposal for a thread_pool, sponsored by our Boost colleague Anthony Williams:

http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

Thank you,

Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

The threadpool lib in the vault has no relation to the TR2 proposal. It seams that Anthonys lib is simple - no support for task interruption, lazy task evaluation or task chanining, channel and queue types etc.

It'd be an odd thing if we had a TR2 threadpool lib and a different lib in Boost (assuming both proposals are succesful). Shoudln't there be some some syncing between the respective authors? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

k-oli＠gmx.de

6:14 p.m.

Am Freitag, 12. September 2008 13:24:14 schrieb joaquin@tid.es:

...

k-oli@gmx.de escribió:

...
Am Freitag, 12. September 2008 10:52:50 schrieb joaquin@tid.es:

...
Hello,

I learnt yesterday that there is a TR2 proposal for a thread_pool, sponsored by our Boost colleague Anthony Williams:

http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

Thank you,

Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

The threadpool lib in the vault has no relation to the TR2 proposal. It seams that Anthonys lib is simple - no support for task interruption, lazy task evaluation or task chanining, channel and queue types etc.

It'd be an odd thing if we had a TR2 threadpool lib and a different lib in Boost (assuming both proposals are succesful). Shoudln't there be some some syncing between the respective authors?

What about the future libs (see review schedule)? Anthonys proposal in TR2 and BRaddocks implementation are also not in sync?! Oliver

Anthony Williams

26 Sep 26 Sep

8:15 p.m.

k-oli@gmx.de writes:

...

Am Freitag, 12. September 2008 13:24:14 schrieb joaquin@tid.es:

...
It'd be an odd thing if we had a TR2 threadpool lib and a different lib in Boost (assuming both proposals are succesful). Shoudln't there be some some syncing between the respective authors?

What about the future libs (see review schedule)? Anthonys proposal in TR2 and BRaddocks implementation are also not in sync?!

That's true. My futures implementation is up for review alongside Braddock's. Incidentally, the C++ comittee voted futures into C++0x last week. The proposals that got voted in are: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2671.html http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2709.html Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

vicente.botet

13 Sep 13 Sep

8:29 a.m.

----- Original Message ----- From: <joaquin@tid.es> To: <boost@lists.boost.org> Sent: Friday, September 12, 2008 1:24 PM Subject: Re: [boost] [threadpool] relation with TR2 proposal

...

k-oli@gmx.de escribió:

...
Am Freitag, 12. September 2008 10:52:50 schrieb joaquin@tid.es:

...
Hello,

I learnt yesterday that there is a TR2 proposal for a thread_pool, sponsored by our Boost colleague Anthony Williams:

http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

Thank you,

Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

The threadpool lib in the vault has no relation to the TR2 proposal. It seams that Anthonys lib is simple - no support for task interruption, lazy task evaluation or task chanining, channel and queue types etc.

It'd be an odd thing if we had a TR2 threadpool lib and a different lib in Boost (assuming both proposals are succesful). Shoudln't there be some some syncing between the respective authors?

Hi, BTW, Anthony, how packaged_task are related to the class thread_pool in n2276. I don't see any connection in the interface. Shouldn't submit_task return a packaged_task<T> template<typename F> packaged_task<typename result_of<F()>::type> submit_task(F const& f); Shouldn't be also the case for the functions launch_in_pool and launch_in_thread? Olivier what about adding launch_in_pool to the threadpool library? Olivier, what will be the inpact on your library if it uses Anthony Futures library instead? What will be missing on the Anthony Futures library? Vicente

k-oli＠gmx.de

6:56 p.m.

Am Samstag, 13. September 2008 10:29:25 schrieb vicente.botet:

...

Olivier what about adding launch_in_pool to the threadpool library?

I don't know a reason to introduce launch_in_pool!

...

Olivier, what will be the inpact on your library if it uses Anthony Futures library instead? What will be missing on the Anthony Futures library?

chaining tasks and lazy evaluation of task would not be possible. As Anthony noticed in one of his previous post - Braddocks future lib has more features. Maybe a merge of both future libs will become part of boost. Oliver

vicente.botet

8:19 p.m.

----- Original Message ----- From: <k-oli@gmx.de> To: <boost@lists.boost.org> Sent: Saturday, September 13, 2008 8:56 PM Subject: Re: [boost] [threadpool] relation with TR2 proposal

...

Am Samstag, 13. September 2008 10:29:25 schrieb vicente.botet:

...
Olivier what about adding launch_in_pool to the threadpool library?

I don't know a reason to introduce launch_in_pool!

Well, most of the applications will work very well with only one pool for all the tasks, so the library could provide a private one. Do you think that applications will be more efficient with several pools? Do you have a use case? It would be interesting to compare both approaches.

...

...
Olivier, what will be the inpact on your library if it uses Anthony Futures library instead? What will be missing on the Anthony Futures library?

chaining tasks and lazy evaluation of task would not be possible. As Anthony noticed in one of his previous post - Braddocks future lib has more features. Maybe a merge of both future libs will become part of boost.

I remenber, the problem was with the set callback function. Vicente

Anthony Williams

26 Sep 26 Sep

8:17 p.m.

"vicente.botet" <vicente.botet@wanadoo.fr> writes:

...

BTW, Anthony, how packaged_task are related to the class thread_pool in n2276. I don't see any connection in the interface.

packaged_task is intended to be a building block for a simple thread pool. As such it will not be exposed to the user: just the future will be visible. See the intro to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2709.html

...

Shouldn't submit_task return a packaged_task<T> template<typename F> packaged_task<typename result_of<F()>::type> submit_task(F const& f);

No. You don't want the task handle, but the future.

...

Shouldn't be also the case for the functions launch_in_pool and launch_in_thread?

Same here. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Johan Torp

14 Sep 14 Sep

10:41 a.m.

JOAQUIN M. LOPEZ MUÑOZ wrote:

...

http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

My _guess_ is that the C++ standard committee is targeting a thread pool which can help to ease extraction of parallel performance, not a fully configurable templated thread pool with lots of neat features. I believe the standard efforts are inspired by what java (see java.util.concurrency and the part called the fork-join framework) and .NET standard libraries (TPL, task parallel library) and intel thread building blocks (a C++ library) provide. They all have thread pools which internally employs work stealing-based scheduling and tries to keep a number of worker threads equivalent to the number of cores. Above the thread pool, there are constructs such as parallel_for. Having such high-level constructs is going to be very important in extracting parallel performance of thread based languages. Read section 4.6 of my master's thesis to get a basic understanding of thread-level scheduling, it can be found at: www.johantorp.com And see intel TBB excellent tutorial which explains their approach in rather much detail: http://www.threadingbuildingblocks.org/documentation.php My five cents is that it would be great to separate the concerns of a thread pool and scheduling. But the real value for most applications would be to have parallel constructs (such as parallel_for) built on a single thread pool with smart work-stealing scheduling and a number of worker threads hinted to be bound to cores by processor affinity. To my knowledge, out of the .NET, java and TBB, only TBB exposes its thread pool. Hence, a simple interface such as launch_in_pool with a decent implementation might provide a lot more value than a fully configurable thread pool library. In practice, just providing the interface to launch_in_pool has proven difficult as it returns a future value. The problem is nailing down a future interface which is both expressive and can be implemented in a lightweight manner. Best Regards, Johan Torp www.johantorp.com -- View this message in context: http://www.nabble.com/-threadpool--relation-with-TR2-proposal-tp19452045p194... Sent from the Boost - Dev mailing list archive at Nabble.com.

vicente.botet

16 Sep 16 Sep

5:20 a.m.

----- Original Message ----- From: "Johan Torp" <johan.torp@gmail.com> To: <boost@lists.boost.org> Sent: Sunday, September 14, 2008 12:41 PM Subject: Re: [boost] [threadpool] relation with TR2 proposal

...

JOAQUIN M. LOPEZ MUÑOZ wrote:

...
http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

My _guess_ is that the C++ standard committee is targeting a thread pool which can help to ease extraction of parallel performance, not a fully configurable templated thread pool with lots of neat features.

It's my _guess_ also. I think that the Boost threadpool library must integrate child-tasks and task stealing between worker threads.

...

I believe the standard efforts are inspired by what java (see java.util.concurrency and the part called the fork-join framework) and .NET standard libraries (TPL, task parallel library) and intel thread building blocks (a C++ library) provide.

Are you talking about the C++ standard efforts? The n2276 proposal is a simple thread pool without possibility to steel tasks betweer worker threads. Are there other work in progress? Some references on fork-join framework for the reader: article "A Java Fork/Join Framework" http://gee.cs.oswego.edu/dl/papers/fj.pdf java doc http://g.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/FJTask.html java source http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/FJTaskR... BTW, is someone already working on the FJTask adaptation to C++, or something like that?

...

They all have thread pools which internally employs work stealing-based scheduling and tries to keep a number of worker threads equivalent to the number of cores. Above the thread pool, there are constructs such as parallel_for. Having such high-level constructs is going to be very important in extracting parallel performance of thread based languages.

Read section 4.6 of my master's thesis to get a basic understanding of thread-level scheduling, it can be found at: www.johantorp.com

I'll take a look. Could you resume the conclusion of your thesis?

...

And see intel TBB excellent tutorial which explains their approach in rather much detail: http://www.threadingbuildingblocks.org/documentation.php

My five cents is that it would be great to separate the concerns of a thread pool and scheduling. But the real value for most applications would be to have parallel constructs (such as parallel_for) built on a single thread pool with smart work-stealing scheduling and a number of worker threads hinted to be bound to cores by processor affinity. To my knowledge, out of the .NET, java and TBB, only TBB exposes its thread pool. Hence, a simple interface such as launch_in_pool with a decent implementation might provide a lot more value than a fully configurable thread pool library.

Doesn't FJTask expose the worker thread on FJTaskRunner.java class? What do you mean by a decent implementation?

...

In practice, just providing the interface to launch_in_pool has proven difficult as it returns a future value. The problem is nailing down a future interface which is both expressive and can be implemented in a lightweight manner.

Could you elaborate more, which difficulties? What is missing on the current Future proposals for you? Regards, Vicente

k-oli＠gmx.de

9:48 a.m.

Am Dienstag, 16. September 2008 07:20:33 schrieb vicente.botet:

...

It's my _guess_ also. I think that the Boost threadpool library must integrate child-tasks and task stealing between worker threads.

I'm currently working on a work-stealing strategy.

...

Some references on fork-join framework for the reader: article "A Java Fork/Join Framework" http://gee.cs.oswego.edu/dl/papers/fj.pdf

Fork/Join uses work-stealing - you need a kind of synchronization between all worker threads to finish one task (joining subtasks). I think this is not in the scope of threadpool. regards, Oliver

Johan Torp

9:49 a.m.

viboes wrote:

...

...
My _guess_ is that the C++ standard committee is targeting a thread pool which can help to ease extraction of parallel performance, not a fully configurable templated thread pool with lots of neat features.

It's my _guess_ also. I think that the Boost threadpool library must integrate child-tasks and task stealing between worker threads.

I haven't though about it much but I think you might be able to separate scheduling from a thread pool library. You could even split it into three pieces, a generic passive thread pool, scheduling algorithms and a launch_in_pool free function which employs a single static thread pool and some scheduling behind the scenes. viboes wrote:

...

...
I believe the standard efforts are inspired by what java (see java.util.concurrency and the part called the fork-join framework) and .NET standard libraries (TPL, task parallel library) and intel thread building blocks (a C++ library) provide.

Are you talking about the C++ standard efforts? The n2276 proposal is a simple thread pool without possibility to steel tasks betweer worker threads. Are there other work in progress?

Actually, I thought that the idea behind N2276 was to leave a lot of space in the definition of the launch_in_pool so that library implementors could have sophisticated work-stealing behind the scenes. viboes wrote:

...

BTW, is someone already working on the FJTask adaptation to C++, or something like that?

I believe Intel TBB has come the longest way in extracting task level parallelism via thread pools. I suspect a C++ solution will differ quite a lot from a java implementation since the languages are so different. viboes wrote:

...

...
Read section 4.6 of my master's thesis to get a basic understanding of thread-level scheduling, it can be found at: www.johantorp.com I'll take a look. Could you resume the conclusion of your thesis?

My thesis tries to summarize the parallel shift as a whole, thread level scheduling is just discussed on a page or two. Here is the title and abstract: Part 1 - A bird's eye view of desktop parallelism, Part 2 - Zooming in on C++0x's memory model The first part is an overview of the paradigmatic shift to parallelism that is currently taking place. It explains why processors need to become parallel, how they might function and which types of parallelism there are. Given that information, it explains why threads and locks is not a suitable programming model, how threading is being improved and used to extract parallel performance and what problems awaits new parallel programming models and how they might work. The final chapter surveys the landscape of existing parallel software and hardware projects and relates them to the overview. The overview is intended for desktop and embedded programmers and architects. The second part explains how to use C++'s upcoming memory model and atomic API. It also relates the memory model to classical definitions of distributed computing in an attempt to bridge the gap in terminology between the research literature and C++. An implementation of hazard pointers and a lock-free stack and queue are given as example C++0x code. This part is aimed at expert C++ developers and the research community. PDF available at: www.johantorp.com viboes wrote:

...

...
To my knowledge, out of the .NET, java and TBB, only TBB exposes its thread pool.

Doesn't FJTask expose the worker thread on FJTaskRunner.java class?

I meant that you couldn't explicitly control how the thread pool behaved (scheduling stategies and parameters). But it might very well be wrong, I haven't had a deep look at Java and .NET's solution. viboes wrote:

...

What do you mean by a decent implementation?

One which can extract most task level parallel performance (given the task decomposition the user has provided) on many different processor architectures and OSes. viboes wrote:

...

...
In practice, just providing the interface to launch_in_pool has proven difficult as it returns a future value. The problem is nailing down a future interface which is both expressive and can be implemented in a lightweight manner.

Could you elaborate more, which difficulties? What is missing on the current Future proposals for you?

There are at least two things which still needs be solved: 1. How to wait for multiple futures 2. How to employ work-stealing when one thread waits on a future An expressive solution is to allow some callback hooks (for future::wait and promise::set) but that is quite hackish. IMO you should not be able to inject arbitrary code which is run in promise::set via a future object that runs on a completely different thread. I'm really not satisfied with the wait_for_any and wait_for_all proposal or operators proposal either. Window's traditional WaitForMultipleObject and POSIX select statement is IMHO very flawed. You have to collect a lot of handles from all over your program and then wait for them in one place. This really inverts the architecture of your program. I'd like a solution which is expressive enough to "lift" an aribitrary function to futures. I.e.: R foo(int a, int b) should be easily rewritten as future<R> foo(future<int> a, future<int> b) You could say that I want futures to be as composable as the retry and orElse constructs of transactional memory (see my thesis if you are not familiar with transactional memory). You might also want to support waiting on a dynamic set of futures and as soon as one of them becomes ready. There is no agreement on what use cases futures should or have to support. The problem is that the waiting and combining use cases really affect the interface and if we do not support them, there will be interoperability problems between different libraries (such as poet, thread pools and asio) which might need to have their own specialized and non compatible future objects. To allow extracting maximum task level parallelism, there is also a need that futures are really lightweight. For more information, see: http://www.nabble.com/-future--Early-draft-of-wait-for-multiple-futures-inte... http://www.nabble.com/Updated-version-of-futures-library-to17555389.html#a17... Best Regards, Johan Torp www.johantorp.com -- View this message in context: http://www.nabble.com/-threadpool--relation-with-TR2-proposal-tp19452045p195... Sent from the Boost - Dev mailing list archive at Nabble.com.

k-oli＠gmx.de

9:56 a.m.

Am Dienstag, 16. September 2008 11:49:49 schrieb Johan Torp:

...

There are at least two things which still needs be solved: 1. How to wait for multiple futures

Braddocks future lib provides support logical ops for futures: future< int > f1 = ... future< string > f2 = ... future< void > f3 = op( f1) && op( f2); f3.wait(); // waits for f1 and f2

...

2. How to employ work-stealing when one thread waits on a future

Do you mean fork/join algorithm instead work-stealing? regards, Oliver

Johan Torp

10:29 a.m.

k-oli wrote:

...

Am Dienstag, 16. September 2008 11:49:49 schrieb Johan Torp:

...
There are at least two things which still needs be solved: 1. How to wait for multiple futures

Braddocks future lib provides support logical ops for futures:

future< int > f1 = ... future< string > f2 = ... future< void > f3 = op( f1) && op( f2); f3.wait(); // waits for f1 and f2

Yes, but it is implemented in terms of a public callback. IMHO, this is _really_ dangerous. k-oli wrote:

...

...
2. How to employ work-stealing when one thread waits on a future Do you mean fork/join algorithm instead work-stealing?

Yes, you're right - it does not have anything to do with work-stealing per se. I mean that you want to intercept when a worker thread should have been blocked and process other tasks in the wait call: void some_user_level_task() { ... some_future.wait(); // Do not really wait, execute other tasks using the same call stack ... } Johan -- View this message in context: http://www.nabble.com/-threadpool--relation-with-TR2-proposal-tp19452045p195... Sent from the Boost - Dev mailing list archive at Nabble.com.

vicente.botet

18 Sep 18 Sep

6:07 p.m.

----- Original Message ----- From: "Johan Torp" <johan.torp@gmail.com> To: <boost@lists.boost.org> Sent: Tuesday, September 16, 2008 12:29 PM Subject: Re: [boost] [threadpool] relation with TR2 proposal

...

k-oli wrote:

...
...
2. How to employ work-stealing when one thread waits on a future Do you mean fork/join algorithm instead work-stealing?

Yes, you're right - it does not have anything to do with work-stealing per se. I mean that you want to intercept when a worker thread should have been blocked and process other tasks in the wait call:

void some_user_level_task() { ... some_future.wait(); // Do not really wait, execute other tasks using the same call stack ... }

Hi Johan, Should this behaviour be extended to other synchronization functions like mutex lock, condition wait, ... ? For this to work all these primitives must be wrapped, replacing the blocking primitives by the corresponding try_ primitives. In addition the threadpool should provide a one step scheduling on the *current* worker thread. What do you think? Vicente

Johan Torp

8:03 p.m.

viboes wrote:

...

...
I mean that you want to intercept when a worker thread should have been blocked and process other tasks in the wait call:

void some_user_level_task() { ... some_future.wait(); // Do not really wait, execute other tasks using the same call stack ... }

Should this behaviour be extended to other synchronization functions like mutex lock, condition wait, ... ?

It is a very interesting idea, here are my thoughts; First of all, I think even doing work in future::wait is a little to automagic and think we should investigate if we can't make it explicit somehow. You probably do not want to do work while waiting for mutexes. They are only supposed to help synchronize and order operations, waiting is just a side effect of that. The idea of doing work while waiting on condition variables seems like a natural extension of doing the same for futures since futures also have the semantics of waiting for an event. Also, futures will most probably be built on top of condition variables. Perhaps some kind of policy with a default waiting behaviour; condition normal_condition_variable; condition<WorkWhileWaiting> cv; future<int> normal_future; future<int, WorkWhileWaiting> f; viboes wrote:

...

In addition the threadpool should provide a one step scheduling on the *current* worker thread.

There are certain subtle dangers with executing thread pool tasks belonging to other threads, but they might be acceptable if this behaviour is explicit (perhaps even if they aren't). IIRC, I discussed the matter with Peter Dimov here in boost.dev but can't seem to find the thread. Johan -- View this message in context: http://www.nabble.com/-threadpool--relation-with-TR2-proposal-tp19452045p195... Sent from the Boost - Dev mailing list archive at Nabble.com.

Vicente Botet

10:07 p.m.

----- Original Message ----- From: "Johan Torp" <johan.torp@gmail.com> To: <boost@lists.boost.org> Sent: Thursday, September 18, 2008 10:03 PM Subject: Re: [boost] [threadpool] relation with TR2 proposal

...

viboes wrote:

...
...
I mean that you want to intercept when a worker thread should have been blocked and process other tasks in the wait call:

void some_user_level_task() { ... some_future.wait(); // Do not really wait, execute other tasks using the same call stack ... }

Should this behaviour be extended to other synchronization functions like mutex lock, condition wait, ... ?

It is a very interesting idea, here are my thoughts;

First of all, I think even doing work in future::wait is a little to automagic and think we should investigate if we can't make it explicit somehow.

You probably do not want to do work while waiting for mutexes. They are only supposed to help synchronize and order operations, waiting is just a side effect of that.

As far as the worker thread do a mutex lock, the thread will block, reducing the parallelism. I need to think more on this issue.

...

The idea of doing work while waiting on condition variables seems like a natural extension of doing the same for futures since futures also have the semantics of waiting for an event. Also, futures will most probably be built on top of condition variables.

What about yield and sleep?

...

Perhaps some kind of policy with a default waiting behaviour;

condition normal_condition_variable; condition<WorkWhileWaiting> cv; future<int> normal_future; future<int, WorkWhileWaiting> f;

I'll prefere something like that namespace this_task { void yield(); void sleep(system_time const& abs_time); template<typename TimeDuration> inline void sleep(TimeDuration const& rel_time) { this_task::sleep(get_system_time()+rel_time); } template<typename T> shared_future_wrapper<T> wrap(shared_future<T>& f); condition_variable_wrapper wrap(boost::condition_variable& cond); } The user could then be able to void some_user_level_task() { // ... sleep for a while while executing other tasks this_task::sleep(t); // use any of the blocking future functions while executing other tasks this_task::wrap(some_future).wait(); // or this_task::shared_future_wrapper<T> w_some_future(some_future); w_some_future.wait(); // yielding while (cnd) { // ... this_task::yield(); } }

...

viboes wrote:

...
In addition the threadpool should provide a one step scheduling on the *current* worker thread.

There are certain subtle dangers with executing thread pool tasks belonging to other threads, but they might be acceptable if this behaviour is explicit (perhaps even if they aren't). IIRC, I discussed the matter with Peter Dimov here in boost.dev but can't seem to find the thread.

Which dangers do you have in mind? Vicente

Anthony Williams

26 Sep 26 Sep

8:31 p.m.

Johan Torp <johan.torp@gmail.com> writes:

...

viboes wrote:

...
...
My _guess_ is that the C++ standard committee is targeting a thread pool which can help to ease extraction of parallel performance, not a fully configurable templated thread pool with lots of neat features.

It's my _guess_ also. I think that the Boost threadpool library must integrate child-tasks and task stealing between worker threads.

I haven't though about it much but I think you might be able to separate scheduling from a thread pool library. You could even split it into three pieces, a generic passive thread pool, scheduling algorithms and a launch_in_pool free function which employs a single static thread pool and some scheduling behind the scenes.

That sounds interesting.

...

viboes wrote:

...
...
I believe the standard efforts are inspired by what java (see java.util.concurrency and the part called the fork-join framework) and .NET standard libraries (TPL, task parallel library) and intel thread building blocks (a C++ library) provide.

Are you talking about the C++ standard efforts? The n2276 proposal is a simple thread pool without possibility to steel tasks betweer worker threads. Are there other work in progress?

Actually, I thought that the idea behind N2276 was to leave a lot of space in the definition of the launch_in_pool so that library implementors could have sophisticated work-stealing behind the scenes.

Yes. The intention was that the free function launch_in_pool would use an implementation-provided global thread pool that would be as smart as the library implementor could manage. e.g. I have a working prototype for Windows that initially runs one pool thread per CPU. If a pool thread blocks on a future for a pool task it suspends the current task and runs a new task from the pool.

...

viboes wrote:

...
BTW, is someone already working on the FJTask adaptation to C++, or something like that?

I believe Intel TBB has come the longest way in extracting task level parallelism via thread pools. I suspect a C++ solution will differ quite a lot from a java implementation since the languages are so different.

My thread pool prototype is very similar in behaviour.

...

viboes wrote:

...
...
In practice, just providing the interface to launch_in_pool has proven difficult as it returns a future value. The problem is nailing down a future interface which is both expressive and can be implemented in a lightweight manner.

Could you elaborate more, which difficulties? What is missing on the current Future proposals for you?

There are at least two things which still needs be solved: 1. How to wait for multiple futures

My futures prototype at <http://www.justsoftwaresolutions.co.uk/threading/updated-implementation-of-c++-futures-3.html> handles that.

...

2. How to employ work-stealing when one thread waits on a future

An expressive solution is to allow some callback hooks (for future::wait and promise::set) but that is quite hackish. IMO you should not be able to inject arbitrary code which is run in promise::set via a future object that runs on a completely different thread.

Yes. This needs to be internal to the implementation, which requires the future and thread pool to cooperate.

...

I'm really not satisfied with the wait_for_any and wait_for_all proposal or operators proposal either. Window's traditional WaitForMultipleObject and POSIX select statement is IMHO very flawed. You have to collect a lot of handles from all over your program and then wait for them in one place. This really inverts the architecture of your program. I'd like a solution which is expressive enough to "lift" an aribitrary function to futures. I.e.:

R foo(int a, int b) should be easily rewritten as future<R> foo(future<int> a, future<int> b)

You could say that I want futures to be as composable as the retry and orElse constructs of transactional memory (see my thesis if you are not familiar with transactional memory). You might also want to support waiting on a dynamic set of futures and as soon as one of them becomes ready.

My futures prototype has wait_for_any/wait_for_all that work on iterator ranges (e.g. vector of shared_future) Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Johan Torp

27 Sep 27 Sep

1:35 p.m.

Anthony Williams-4 wrote:

...

...
...
...
In practice, just providing the interface to launch_in_pool has proven difficult as it returns a future value. The problem is nailing down a future interface which is both expressive and can be implemented in a lightweight manner.

Could you elaborate more, which difficulties? What is missing on the current Future proposals for you?

There are at least two things which still needs be solved: 1. How to wait for multiple futures

My futures prototype at <http://www.justsoftwaresolutions.co.uk/threading/updated-implementation-of-c++-futures-3.html> handles that.

I'm not very satisified with that interface, but I may very well be missing something. For instance, can you implement additions of futures on top of your interface? future<int> operator+(future<int>, future<int>). Rather than having to gather up all futures from all over a program and having to wait for them in one single place, I think it would provide a lot of value if you could compose futures to an arbitrary depth. That is, I do not want a construct that is similar to POSIX select or Windows' WaitForMultipleObject as they really mess with program structure. Also, your dynamic wait_for_any implementation requires O(N^2) operations in cases where you want to wait until one value is ready, do something, then resume waiting until the next one ready and so on. Anthony Williams-4 wrote:

...

...
2. How to employ work-stealing when one thread waits on a future

An expressive solution is to allow some callback hooks (for future::wait and promise::set) but that is quite hackish. IMO you should not be able to inject arbitrary code which is run in promise::set via a future object that runs on a completely different thread.

Yes. This needs to be internal to the implementation, which requires the future and thread pool to cooperate.

It doesn't necessarily need to be internal. It could be code aimed to be executed in a thread pool should explicitly wrap their waiting: // Will employ work stealing if this_thread is a worker thread void wait_or_work_steal(future<...>) I'm far from convinced that an automagical solution where futures cooperate with a single instance thread pool is the best. That means a lot of coupling and I'm not sure it will provide much value. I think a future value is such an important construct in its own right, that thread pool design decisions should not be allowed to affect it very much. Thread pools need a really lightweight task abstraction to allow extraction of finer grain task level parallelism, i.e. futures should try to be lightweight (if std::unique_future really should be the return value from a thread pool). Other than this, I'm not sure how much a thread pool should be allowed to affect the future design and implementation. IMHO, an imagined thread pool implementation should be one of many "test users" of a future implementation to see if the API and implementation is satisfactory. Johan -- View this message in context: http://www.nabble.com/-threadpool--relation-with-TR2-proposal-tp19452045p197... Sent from the Boost - Dev mailing list archive at Nabble.com.

joaquin＠tid.es

29 Sep 29 Sep

6:22 a.m.

Anthony Williams escribió:

...

Yes. The intention was that the free function launch_in_pool would use an implementation-provided global thread pool that would be as smart as the library implementor could manage.

e.g. I have a working prototype for Windows that initially runs one pool thread per CPU. If a pool thread blocks on a future for a pool task it suspends the current task and runs a new task from the pool.

How do you do that? By re-executing the blocking task or using fibers? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Anthony Williams

6:52 a.m.

joaquin@tid.es writes:

...

Anthony Williams escribió:

...
Yes. The intention was that the free function launch_in_pool would use an implementation-provided global thread pool that would be as smart as the library implementor could manage.

e.g. I have a working prototype for Windows that initially runs one pool thread per CPU. If a pool thread blocks on a future for a pool task it suspends the current task and runs a new task from the pool.

How do you do that? By re-executing the blocking task or using fibers?

I did just execute the new task on the same stack as the existing task, but was mindful of the stack overflow problem that Johan mentioned earlier. I've now changed my prototype to use fibers to switch stacks. It's not perfect, as the nested task still has the same thread ID as another currently-executing (but suspended) task. I'm looking into ways to handle this, but it would really mean tying *everything* to the thread pool implementation. Like I said, it's a prototype. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

k-oli＠gmx.de

7:33 a.m.

Am Montag, 29. September 2008 08:52:52 schrieb Anthony Williams:

...

joaquin@tid.es writes:

...
Anthony Williams escribió:

...
Yes. The intention was that the free function launch_in_pool would use an implementation-provided global thread pool that would be as smart as the library implementor could manage.

e.g. I have a working prototype for Windows that initially runs one pool thread per CPU. If a pool thread blocks on a future for a pool task it suspends the current task and runs a new task from the pool.

How do you do that? By re-executing the blocking task or using fibers?

I did just execute the new task on the same stack as the existing task, but was mindful of the stack overflow problem that Johan mentioned earlier. I've now changed my prototype to use fibers to switch stacks. It's not perfect, as the nested task still has the same thread ID as another currently-executing (but suspended) task. I'm looking into ways to handle this, but it would really mean tying *everything* to the thread pool implementation.

Like I said, it's a prototype.

Anthony

AFAIK the coroutine library in the vault does also stack switching. Are fibers/stack switching supported by other platforms (HP/UX, MacOS etc.)? regards, Oliver

Anthony Williams

7:37 a.m.

k-oli@gmx.de writes:

...

Am Montag, 29. September 2008 08:52:52 schrieb Anthony Williams:

...
joaquin@tid.es writes:

...
Anthony Williams escribió:

...
Yes. The intention was that the free function launch_in_pool would use an implementation-provided global thread pool that would be as smart as the library implementor could manage.

e.g. I have a working prototype for Windows that initially runs one pool thread per CPU. If a pool thread blocks on a future for a pool task it suspends the current task and runs a new task from the pool.

How do you do that? By re-executing the blocking task or using fibers?

I did just execute the new task on the same stack as the existing task, but was mindful of the stack overflow problem that Johan mentioned earlier. I've now changed my prototype to use fibers to switch stacks. It's not perfect, as the nested task still has the same thread ID as another currently-executing (but suspended) task. I'm looking into ways to handle this, but it would really mean tying *everything* to the thread pool implementation.

Like I said, it's a prototype.

Anthony

AFAIK the coroutine library in the vault does also stack switching. Are fibers/stack switching supported by other platforms (HP/UX, MacOS etc.)?

I haven't looked into it yet, so I don't know. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams

26 Sep 26 Sep

8:12 p.m.

Hi all, Sorry for the delayed response to this thread. Johan Torp <johan.torp@gmail.com> writes:

...

JOAQUIN M. LOPEZ MUÑOZ wrote:

...
http://www2.open-std.org/JTC1/sc22/wg21/docs/papers/2007/n2276.html

What is the relationship with this proposal of the threadpool library that is being discussed these days here at the Boost list?

My _guess_ is that the C++ standard committee is targeting a thread pool which can help to ease extraction of parallel performance, not a fully configurable templated thread pool with lots of neat features.

The committee has decided not to discuss thread pools until TR2, at which point all current implementation experience will be taken into account.

...

My five cents is that it would be great to separate the concerns of a thread pool and scheduling. But the real value for most applications would be to have parallel constructs (such as parallel_for) built on a single thread pool with smart work-stealing scheduling and a number of worker threads hinted to be bound to cores by processor affinity. To my knowledge, out of the .NET, java and TBB, only TBB exposes its thread pool. Hence, a simple interface such as launch_in_pool with a decent implementation might provide a lot more value than a fully configurable thread pool library.

That was my thought. Anthony -- Anthony Williams | Just Software Solutions Ltd Custom Software Development | http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

6131

Age (days ago)

6148

Last active (days ago)

List overview

Download

24 comments

6 participants

participants (6)

Anthony Williams
joaquin＠tid.es
Johan Torp
k-oli＠gmx.de
Vicente Botet
vicente.botet