We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

newer
Beginner Request for Contributing...

older
[WG21 mailing] N4453 Resumable...

Hartmut Kaiser

12 Oct 2015 12 Oct '15

12:52 p.m.

Sorry for cross-posting.

...

Vicente J. Botet Escriba wrote:

I have a branch (https://github.com/boostorg/thread/tree/feature/non_blocking_futures) that don't blocks on any future, but this will break async.

FWIW, the design decision to let those (and only those) futures block on destruction which are returned from async was one of the really bad decisions made for C++11, however that's just my opinion (others agree, but yet others disagree). This flaw essentially caused us to spend many committee hours on discussions how to mitigate the situation, with proposed solutions ranging from 'leave it be as is' to as extreme as 'deprecate it now'. There seems to be a growing consensus however to develop other constructs which eventually could replace async, such as executors, etc. I'm a proponent of following that path. More generally, I strongly believe that we have to come up with a _coherent story_ for all kinds of (higher-level) parallelism in C++: task-based parallelism, asynchronous, continuation style programming, fork-join parallelism of heterogeneous tasks, simple fork-join (iteration based) parallelism, etc. All of this has to go hand in hand with a (orthogonal) vectorization story. We also need to think about integrating GPUs. The goal has to be to make parallelism in C++ independent of any external solutions such as OpenMP, OpenACC, etc. Many of the building blocks for this are already being discussed: executors and executor_traits for customizing the 'how and where' of task execution, executor parameters for grain-size control, and execution policies as the higher level facility allowing to tie everything together (see for instance HPX [7]). We (as a committee) already allow for passing execution policies as the first argument to parallel algorithms (Parallelism TS, N4505 [2]). We already have some facilities for task-based, asynchronous, and continuation style parallelism in place (Concurrency TS, N4501 [3]). We have a paper proposing task_blocks for fork-join parallelism of heterogeneous tasks (N4411 [4]). We have two competing (but converging) efforts for defining an executor concept (N4406 [5], PR0008R0 [6]). We have a (strong) proposal for simplifying continuation based programming (await, PR0057R0 [8]). We have attempts on integrating executors with existing facilities (Boost, HPX). We miss work on parallel ranges. We miss work on integrating data structures with data placement policies in conjunction with executors. We miss work on extending all of this to many-core, distributed, and high-performance computing. Etc. We (as a community) really need a higher level, over-arching approach which ties all of the above together! My plan is to work on a corresponding concept paper by the time of the committee meeting end of February 2016. Our group has outlined our current understanding and a possible approach to this here [1]. I'd like for this to be understood as a seed for a wider discussion. Needless to say, I'd very much like to collaborate on this with anybody interested in joining the effort. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu [1] http://stellar.cct.lsu.edu/pubs/executors_espm2_2015.pdf [2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4505.pdf [3] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4501.html [4] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4411.pdf [5] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf [6] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0008r0.pdf [7] http://stellar-group.github.io/hpx/docs/html/hpx/manual/parallel.html [8] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0057r0.pdf

Show replies by date

Vicente J. Botet Escriba

13 Oct 13 Oct

2:02 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

Le 12/10/15 14:52, Hartmut Kaiser a écrit :

...

Sorry for cross-posting.

...
Vicente J. Botet Escriba wrote:

I have a branch (https://github.com/boostorg/thread/tree/feature/non_blocking_futures) that don't blocks on any future, but this will break async. FWIW, the design decision to let those (and only those) futures block on destruction which are returned from async was one of the really bad decisions made for C++11, however that's just my opinion (others agree, but yet others disagree). This flaw essentially caused us to spend many committee hours on discussions how to mitigate the situation, with proposed solutions ranging from 'leave it be as is' to as extreme as 'deprecate it now'. There seems to be a growing consensus however to develop other constructs which eventually could replace async, such as executors, etc. I'm a proponent of following that path. Hi Hartmut,

I can agree with you at the C++ standard level, but I have a library to maintain, and I have introduced a very BIG BUG making all the futures block on destruction (develop branch). Now, there is a major need to make the futures returned by then() to be non-blocking. The branch non_blocking_futures has the opposite. However, there are people that would like that the boost::async follows the standard. So, I don't know what to do, fix future::then() and break async(), or deliver the fiw only when async will conform to the standard; knowing that this behavior might be deprecated in 2017. I need the feedback from the Boost community respect to this point.

...

More generally, I strongly believe that we have to come up with a _coherent story_ for all kinds of (higher-level) parallelism in C++: task-based parallelism, asynchronous, continuation style programming, fork-join parallelism of heterogeneous tasks, simple fork-join (iteration based) parallelism, etc. All of this has to go hand in hand with a (orthogonal) vectorization story. We also need to think about integrating GPUs.

The goal has to be to make parallelism in C++ independent of any external solutions such as OpenMP, OpenACC, etc.

Many of the building blocks for this are already being discussed: executors and executor_traits for customizing the 'how and where' of task execution, executor parameters for grain-size control, and execution policies as the higher level facility allowing to tie everything together (see for instance HPX [7]).

We (as a committee) already allow for passing execution policies as the first argument to parallel algorithms (Parallelism TS, N4505 [2]). We already have some facilities for task-based, asynchronous, and continuation style parallelism in place (Concurrency TS, N4501 [3]).

We have a paper proposing task_blocks for fork-join parallelism of heterogeneous tasks (N4411 [4]). We have two competing (but converging) efforts for defining an executor concept (N4406 [5], PR0008R0 [6]).

I'm aware of most of them if not all. I've been implementing/introducing some of these facilities in Boost.Thread recently. I have not implemented the parallel algorithms yet, as this will require time I don't have now. I'm aware of these proposals, and as others I'm experimenting as we don't know yet what would be the final interface.

...

We have a (strong) proposal for simplifying continuation based programming (await, PR0057R0 [8]).

We have attempts on integrating executors with existing facilities (Boost, HPX). A pointer? We miss work on parallel ranges. We miss work on integrating data structures with data placement policies in conjunction with executors. We miss work on extending all of this to many-core, distributed, and high-performance computing.

Etc.

We (as a community) really need a higher level, over-arching approach which ties all of the above together! My plan is to work on a corresponding concept paper by the time of the committee meeting end of February 2016. I plan to update the N4414 Executors and Schedulers Revision 5 proposal as I believe that Executors must be copyable and lightweight. The new proposal P0008R0 comes back to executors that are not copyable and so left the responsibility of the the lifetime to the user (I'm experimenting it in branch make_executors_copyable) Our group has outlined our current understanding and a possible approach to this here [1]. I'd like for this to be understood as a seed for a wider discussion. Needless to say, I'd very much like to collaborate on this with anybody interested in joining the effort. I'll read it and come back to you.

Best, Vicente

...

Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu

[1] http://stellar.cct.lsu.edu/pubs/executors_espm2_2015.pdf [2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4505.pdf [3] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4501.html [4] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4411.pdf [5] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf [6] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0008r0.pdf [7] http://stellar-group.github.io/hpx/docs/html/hpx/manual/parallel.html [8] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0057r0.pdf

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Mikael Olenfalk

8:39 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

On Tue, Oct 13, 2015 at 4:02 AM, Vicente J. Botet Escriba < vicente.botet@wanadoo.fr> wrote:

...

Le 12/10/15 14:52, Hartmut Kaiser a écrit :

...
Sorry for cross-posting.

Vicente J. Botet Escriba wrote:

...
I have a branch (https://github.com/boostorg/thread/tree/feature/non_blocking_futures) that don't blocks on any future, but this will break async.

FWIW, the design decision to let those (and only those) futures block on destruction which are returned from async was one of the really bad decisions made for C++11, however that's just my opinion (others agree, but yet others disagree).

For what it is worth, I cannot figure out how to use future<> when the destructor blocks as it "breaks" most of my usecases. Then again just giving future a detach() member would solve it for me at least. We (as a community) really need a higher level, over-arching approach which

...

...
ties all of the above together! My plan is to work on a corresponding concept paper by the time of the committee meeting end of February 2016.

I plan to update the N4414 Executors and Schedulers Revision 5 proposal as I believe that Executors must be copyable and lightweight. The new proposal P0008R0 comes back to executors that are not copyable and so left the responsibility of the the lifetime to the user (I'm experimenting it in branch make_executors_copyable)

(disclaimer: I haven't had time to play with the branch just yet) I agree that making lifetime the problem of the user is not particularly nice. I'd prefer to split the executors interface where only the part which contains submit() is copyable and let the rest be non-copyable, we have successfully used a similar design internally (it makes the interface which is sent around and copied the smallest possible interface). See below for an incomplete and badly named example. /M // this code will not compile but hopefully you will be able to discern what it is supposed to do: class executor { // non-copyable public: typedef boost::work work; executor(executor const&) = delete; executor& operator=(executor const&) = delete; executor(); virtual ~executor() {}; virtual void close() = 0; virtual bool closed() = 0; // can probably be non-virtual executor_sink create_sink(); // for lack of better name virtual bool try_executing_one() = 0; template <typename Pred> bool reschedule_until(Pred const& pred); // not in the current interface virtual void loop() = 0; private: shared_ptr<something> _something; // given to executor_sink }; class executor_sink { // copyable public: typedef boost::work work; executor(shared_ptr<executor::something>); // public but only usable by executor because something is private executor_sink(executor_sink&) = default; executor_sink& operator=(const executor_sink&) = default; bool closed() const { if (auto shared_something = _something.lock()) return shared_something->closed(); return true; } bool submit(work&& w) { if (auto shared_something = _something.lock()) { shared_something->submit(w); return true; } return false; } private: weak_ptr<executor::something> _something; };

Vicente J. Botet Escriba

5:48 p.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

Le 13/10/15 10:39, Mikael Olenfalk a écrit :

...

On Tue, Oct 13, 2015 at 4:02 AM, Vicente J. Botet Escriba < vicente.botet@wanadoo.fr> wrote:

...
Le 12/10/15 14:52, Hartmut Kaiser a écrit :

...
Sorry for cross-posting.

Vicente J. Botet Escriba wrote:

...
I have a branch (https://github.com/boostorg/thread/tree/feature/non_blocking_futures) that don't blocks on any future, but this will break async.

FWIW, the design decision to let those (and only those) futures block on destruction which are returned from async was one of the really bad decisions made for C++11, however that's just my opinion (others agree, but yet others disagree).

For what it is worth, I cannot figure out how to use future<> when the destructor blocks as it "breaks" most of my usecases. Then again just giving future a detach() member would solve it for me at least.

We (as a community) really need a higher level, over-arching approach which

...
...
ties all of the above together! My plan is to work on a corresponding concept paper by the time of the committee meeting end of February 2016.

I plan to update the N4414 Executors and Schedulers Revision 5 proposal as I believe that Executors must be copyable and lightweight. The new proposal P0008R0 comes back to executors that are not copyable and so left the responsibility of the the lifetime to the user (I'm experimenting it in branch make_executors_copyable)

(disclaimer: I haven't had time to play with the branch just yet)

I agree that making lifetime the problem of the user is not particularly nice. I'd prefer to split the executors interface where only the part which contains submit() is copyable and let the rest be non-copyable, we have successfully used a similar design internally (it makes the interface which is sent around and copied the smallest possible interface).

What you propose is something similar to the split in [p0113r0] where there is an execution_context and executor_type. However using shared_ptr as copyable ensures the lifetime issue, but I don't see the advantage in the split then. There is a problem with the shared_ptr approach that my current implementation in make_executors_copyable shares. The destructor of the shared state can be called in a thread that is part of the threads of the executor. That mean that the destructor must check if the thread to join is this thread and then not call the join. In [p0113r0], the executor_context must outlive the executor_type copies that can be just references to the executor_context. E.g class priority_scheduler : public execution_context { public: class executor_type { public: executor_type(priority_scheduler& ctx, int pri) noexcept : context_(ctx), priority_(pri) { } // ... private: priority_scheduler& context_; int priority_; }; executor_type get_executor(int pri = 0) noexcept { return executor_type(*this, pri); } // ... }; I don't see the need for the split in [p0113r0], as passing the executors by reference is equivalent. So, do we want a design that force the user to ensure that the executor (execution_context) outlive the executor sinks (executor_type? Or, just a copyable executor? Best, Vicente [p0113r0] Executors and Asynchronous Operations, Revision 2 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0113r0.html

Mikael Olenfalk

6:34 p.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

On Tue, Oct 13, 2015 at 7:48 PM, Vicente J. Botet Escriba < vicente.botet@wanadoo.fr> wrote:

...

Le 13/10/15 10:39, Mikael Olenfalk a écrit :

...
I agree that making lifetime the problem of the user is not particularly nice. I'd prefer to split the executors interface where only the part which contains submit() is copyable and let the rest be non-copyable, we have successfully used a similar design internally (it makes the interface which is sent around and copied the smallest possible interface).

What you propose is something similar to the split in [p0113r0] where

there is an execution_context and executor_type.

I will take a closer look at [p0113r0] later tonight (what's up with the curious reference number, is there a system?).

...

However using shared_ptr as copyable ensures the lifetime issue, but I don't see the advantage in the split then.There is a problem with the shared_ptr approach that my current implementation in make_executors_copyable shares. The destructor of the shared state can be called in a thread that is part of the threads of the executor. That mean that the destructor must check if the thread to join is this thread and then not call the join.

I only use the shared_ptr internally in order to detect when the underlying executor is gone. In our code base we only use it to ensure that nobody posts to an executor after it has been destroyed (during shutdown). The split is "necessary" to hide the weak_ptr (because it is ugly) and in order to ensure that nobody accidentally uses a raw reference (the submit() function is gone from the executor). I hadn't even thought of the problem where the shared-state is destroyed in the wrong thread but you are obviously correct. Is it possible to come up with a design which does not have this problem?

...

In [p0113r0], the executor_context must outlive the executor_type copies that can be just references to the executor_context.

Personally I don't like such designs (where the user is required to ensure that something passed around by reference or raw pointer is kept alive between several threads and tasks and whatnots).

...

I don't see the need for the split in [p0113r0], as passing the executors by reference is equivalent.

I don't see it either (not yet at least - I will take a closer look).

...

So, do we want a design that force the user to ensure that the executor (execution_context) outlive the executor sinks (executor_type?

Please no.

...

Or, just a copyable executor?

How does that work when the actual underlying thingie (e.g. boost::asio::io_service) is non-copyable? Kind regards, Mikael

Vicente J. Botet Escriba

14 Oct 14 Oct

6:03 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

Le 13/10/15 20:34, Mikael Olenfalk a écrit :

...

On Tue, Oct 13, 2015 at 7:48 PM, Vicente J. Botet Escriba < vicente.botet@wanadoo.fr> wrote:

...
Le 13/10/15 10:39, Mikael Olenfalk a écrit :

...
However using shared_ptr as copyable ensures the lifetime issue, but I don't see the advantage in the split then.There is a problem with the shared_ptr approach that my current implementation in make_executors_copyable shares. The destructor of the shared state can be called in a thread that is part of the threads of the executor. That mean that the destructor must check if the thread to join is this thread and then not call the join.

I only use the shared_ptr internally in order to detect when the underlying executor is gone. In our code base we only use it to ensure that nobody posts to an executor after it has been destroyed (during shutdown). The split is "necessary" to hide the weak_ptr (because it is ugly) and in order to ensure that nobody accidentally uses a raw reference (the submit() function is gone from the executor). Oh, I missed the sing uses weak_ptr. This justify the split. I'll experiment on the make_executor_copyable_branch.

I hadn't even thought of the problem where the shared-state is destroyed in the wrong thread but you are obviously correct. Is it possible to come up with a design which does not have this problem? No that I know.

...
So, do we want a design that force the user to ensure that the executor (execution_context) outlive the executor sinks (executor_type?

Please no.

...
Or, just a copyable executor?

How does that work when the actual underlying thingie (e.g. boost::asio::io_service) is non-copyable?

The boost::asio::io_service could be stored on the shared state that is not copyable nor movable. Best, Vicente

Oswin Krause

9:22 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

Hi, I am not a threading expert, so please be patient when I got everything wrong. I tried to read up on all the proposals mentioned in this thread. What is troubling me, especially with executioners, is that the proposals are not very detailed about their guaranties. Assume the creation of an async work package in an executioner as outlined in p0058R0 std::future<T> fut =std::async(ex, function,arguments); fut.wait();//?? what p0058R0 is not saying anything about is what is happening when the current thread creating fut is a working thread of the executioner ex, for example when ex is a threadpool. In the worst implementation, fut.wait() will keep the working thread waiting and thus ex has one working thread less until fut becomes ready. This behaviour could cause a deadlock, as all working threads might be waiting for a future to become ready thus exhausting the computing capabilities of ex. The desired behaviour would be that ex is rescheduling a new work-package for the current thread until fut is ready (for example the work package of fut in case it is not scheduled on another thread yet). This has to be guarantied. similarly, I miss possibilities to give the scheduler hints on what should be ready in which order, especially in hierarchical models (the current work package depends on all packages it spawns but it might have to wait for a specific subset of them before it can actually compute something), but also in graphs (e.g. computing block A_ij of some matrix requires the results of Blocks A_i-1j and A_ij-1). the default implementation can ignore hints, but i think that advanced executioners will use this information, especially when a small number of worker threads needs to compute a large amount of work packages with complex dependencies. one way to give this information might simply be fut.wait();//i am waiting for this work package so this is a dependency but maybe a simple extension of the future interface to mark critical dependencies might make this even more powerful: fut.mark_as_critical();//if fut is scheduled in an executioner, inform ex to reschedule to the front of the work package list and an extension of boost wait_for_all (no variadic templates for simplicity) could look like template<typename F1,typename F2,typename F3> void wait_for_all(F1& f1,F2& f2,F3& f3) { //inform that these futures are critical for this work package to continue f1.mark_as_critical(); f2.mark_as_critical(); f3.mark_as_critical(); //now just wait (and get new work packages by the executioner of this thread) f1.wait(); f2.wait(); f3.wait(); } As I said, I might have been wrong all along, thanks for reading to here anyways :) On 2015-10-14 08:03, Vicente J. Botet Escriba wrote:

...

Le 13/10/15 20:34, Mikael Olenfalk a écrit :

...
On Tue, Oct 13, 2015 at 7:48 PM, Vicente J. Botet Escriba < vicente.botet@wanadoo.fr> wrote:

...
Le 13/10/15 10:39, Mikael Olenfalk a écrit :

...
However using shared_ptr as copyable ensures the lifetime issue, but I don't see the advantage in the split then.There is a problem with the shared_ptr approach that my current implementation in make_executors_copyable shares. The destructor of the shared state can be called in a thread that is part of the threads of the executor. That mean that the destructor must check if the thread to join is this thread and then not call the join.

I only use the shared_ptr internally in order to detect when the underlying executor is gone. In our code base we only use it to ensure that nobody posts to an executor after it has been destroyed (during shutdown). The split is "necessary" to hide the weak_ptr (because it is ugly) and in order to ensure that nobody accidentally uses a raw reference (the submit() function is gone from the executor). Oh, I missed the sing uses weak_ptr. This justify the split. I'll experiment on the make_executor_copyable_branch.

I hadn't even thought of the problem where the shared-state is destroyed in the wrong thread but you are obviously correct. Is it possible to come up with a design which does not have this problem? No that I know.

...
So, do we want a design that force the user to ensure that the executor (execution_context) outlive the executor sinks (executor_type?

Please no.

...
Or, just a copyable executor?

How does that work when the actual underlying thingie (e.g. boost::asio::io_service) is non-copyable?

The boost::asio::io_service could be stored on the shared state that is not copyable nor movable.

Best, Vicente

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hartmut Kaiser

12:16 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

...

...
I agree that making lifetime the problem of the user is not particularly nice. I'd prefer to split the executors interface where only the part which contains submit() is copyable and let the rest be non-copyable, we have successfully used a similar design internally (it makes the interface which is sent around and copied the smallest possible interface).

What you propose is something similar to the split in [p0113r0] where there is an execution_context and executor_type. However using shared_ptr as copyable ensures the lifetime issue, but I don't see the advantage in the split then. There is a problem with the shared_ptr approach that my current implementation in make_executors_copyable shares. The destructor of the shared state can be called in a thread that is part of the threads of the executor. That mean that the destructor must check if the thread to join is this thread and then not call the join.

In [p0113r0], the executor_context must outlive the executor_type copies that can be just references to the executor_context.

E.g

class priority_scheduler : public execution_context { public: class executor_type { public: executor_type(priority_scheduler& ctx, int pri) noexcept : context_(ctx), priority_(pri) { }

// ...

private: priority_scheduler& context_; int priority_; };

executor_type get_executor(int pri = 0) noexcept { return executor_type(*this, pri); }

// ... };

I don't think that an execution_context should represent a scheduling policy. This does not make sense conceptually (see below).

...

I don't see the need for the split in [p0113r0], as passing the executors by reference is equivalent.

There is a clear conceptual difference between the execution_context and executor concepts (see Parallelism TS and N4406). The execution_context is a type which sole purpose is to expresses thread-safety guarantees of the scheduled task, i.e. seq: the scheduled tasks cannot be run concurrently with anything else par: the scheduled tasks may run concurrently with any other task of the same batch (see parallel algorithms) HPX extensions: seq(task): the scheduled tasks can run concurrently only with tasks not from the same batch, tasks from the same batch have to be serialized par(task): the scheduled tasks can run concurrently with any other task, even those not from the same batch At the same time, executors encapsulate the 'how and when' of task execution, i.e. various scheduling policies and requirements. BTW, this distinction allows for the integration with yet another concept, which we call execution_parameters. Those encapsulate for instance grain size control (e.g. how many tasks should run on the same thread of execution?) and the control over the amount of resources the executor may use (e.g. how many cores should those tasks run on?). All in all in HPX we allow for vector<int> v = { ... }; parallel::for_each( par.on(my_executor).with(static_chunk_size), begin(v), end(v), [](auto v) {...}); Thus, letting most APIs (such as parallel algorithms, define_task_block, etc.) take an execution_policy instead of just an executor is a Good Thing(tm). For other, mostly lower level APIs - like future::then, async, dataflow, etc. - passing just the executor instance is sufficient as the API implies running the task asynchronously anyways. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu

...

So, do we want a design that force the user to ensure that the executor (execution_context) outlive the executor sinks (executor_type? Or, just a copyable executor?

Best, Vicente [p0113r0] Executors and Asynchronous Operations, Revision 2 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0113r0.html

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Vicente J. Botet Escriba

5:56 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

Le 13/10/15 04:02, Vicente J. Botet Escriba a écrit :

...

Le 12/10/15 14:52, Hartmut Kaiser a écrit :

...
Sorry for cross-posting.

Our group has outlined our current understanding and a possible approach to this here [1]. I'd like for this to be understood as a seed for a wider discussion. Needless to say, I'd very much like to collaborate on this with anybody interested in joining the effort. I'll read it and come back to you. Hi,

In general I like the global direction of p0058r0, but have some concerns respect to the form. This will be longer than I expected. I like: * the fact that we are conceptualizing the interface and allowing to customize it * the additional bulk and synchronous interface execute. * the way async_execute/when_all/when_any can deduce the returned future once we have an executor parameter (even if I would preferred to associate them to an execution policy - see below). * the chaining when_all_execute_and_select * the future cast that can be used when the single thing important is if the task has completed. * rebind, this type trait should be make generic, and customizable by the user rebind<par, MyExecutor> rebind<future<int>, string> * the fact that then() continuations consume value types and not on futures (this is in line to my proposed next() function). have you added some kind of recover continuations (like my recover or catch_error)? * the on(ex).with(p) chaining style (I have used it in order to schedule timing operations in Boost.Thread) submit(sch.on(ex).at(tp), f); * we can retrieve the wrapped type, I like less: * the fact that executors are aware of futures, I like the split of responsibilities, executors have void() work to schedule, and we have a free function like async/spaw/submit that returns a future. The question is which future should return async(ex, ...). p0058r0 propose async_execute that deduces the future from the executor. I believe that in in the same way the execution policy embodies a set of the rules about where, when and how to run a submitted function object, it should have associated also how the asynchronous result is reported, that is, which specific future must be used as result of async; when_all,when_any. IMHO, futures depend on executors, not the opposite. I don't know how to implement an executor that can return my special future. However I know how to implement a future that can store a specific Executor. * I don't know if then_execute result should depend on the future associated to the executor (or execution policy) as I expect the same kind of future as a result. * the name value_type to retrieve the wrapped type. I' don't know if value_type is the most appropriated when we have future<T&>. ValueType as defined in the Range proposal removes references and cv qualifications. * The name future_traits and the fact that future_traits takes a specific class as template parameter and not a type constructor (a high-order meta-function that transforms types on types). IMO what we are mapping is not std::future<T>, but std::future or std::future<_>. Given a type future<int>, it is useful to have its type_constructor. E.g. type_constructor<future<T>> is future<_> type_constructor<apply<TC, T>> is TC value_type<apply<TC, T>> is T As a type constructor future<_> apply<future<_>, string> is the same as future<string>. While apply<future<_>, string> and rebind<future<int>, string> seems similar, apply can be used with any high order meta-function as e.g. apply<lift<future>, string>. I use rebind when you have an instance of a class, as future<int>, optional<int>, and apply is used when you have a type constructor as future, optional. rebind can be defined in function of apply and type_constructor. rebind<X,T> = apply<type_constructor<X>,T> E.g. if we had a future that takes two parameters T and E (as expected does), the type constructor (respect to T) would be future<_, E>. * wondering if the same applies to execution policies. Could we consider that a execution policy wraps an executor? * I need to think about the separation of the execution_policy and the executor. Is the executor copyable? I see that executor policy provides a function to get a reference, but has the executor a reference to its policy? What are the lifetimes of both? executor_type& executor(); * functions having almost the same prototype but behaving quite differently. I see that you propose par(task) policy and that a function can return a future or not depending on the policy (par(task)). I like to use different names when the functions must be used following a different protocol. Do you have an example of an algorithm that is common independently of whether the policy is par or par(task)? * The cumbersome generic interface I believe in general that we need two different interfaces, the user interface and the customization interface. The customization interface is often less friendly than the user interface. The executor_traits interface is for me one way to customize an interface. Other alternatives are also possible (see below) At the user level, the following example Iterator for_each_n(random_access_iterator_tag, ExecutionPolicy&& policy, InputIterator first, Size n, Function f) { using executor_type = typename decay_t<ExecutionPolicy>:::executor_type; executor_traits<executor_type>::execute(policy.executor(), [=](auto idx) { f(first[idx]); }, n ); } seems more cumbersome than something more direct like Iterator for_each_n(random_access_iterator_tag, ExecutorPolicy&& policy, InputIterator first, Size n, Function f) { execute(policy.executor(), [=](auto idx) { f(first[idx]); }, n); } The interface for the user could future_result_type_t<Executor, F> execute(Executor&, F&&, Args...); future_result_type_t<Executor, F> async_execute(Executor&, F&&, Args...); future_result_type_t<Executor, F> execute_n(Executor&, size_t, F&&, Args...); future_result_type_t<Executor, F> async_execute_n(Executor&, size_t, F&&, Args...); Note that the interface allows to pass some information to the task to execute. Note that the bulk versions have a different name as these functions do something different. How to combine the index with the Args can be discussed, but I believe that passing the Index as first parameter of the continuation is a good compromise. However the executor customized interface doesn't needs the Args parameters, as user functions would pack F and Args to make a void(void)/void(size_t) schedulable work. Another cumbersome example using executor_type = typename decay_t<ExecutionPolicy>:::executor_type; return executor_traits<executor_type>::make_future_ready(policy.executor()); or return future_traits<future<int>>::make_ready(); Compare this with a more user friendly return make_future_ready(policy.executor()); or return make<future>(); which of course should be equivalent to the previous code fragment. I'm working on a on-going factories proposal that would allow make<future>(); BTW, the following function is missing from executor_traits as well as make_exceptional_future. static future<void> make_ready_future(executor_type& ex); * What do you think of using overload and flat type traits in order to customize the user interface instead of executor_traits as suggested by Eric? E.g. I would expect that rebind, value_type to be generic and placed at the std level. Other traits are more specific like executor_type, execution_category, ... * Inspired from Boost.Hana and Haskell I have been customizing some type classes following the following pattern pattern. It is quite close to the trait approach, however, I use an additional level of indirection via a tag type trait that allow to dispatch to a common model instead of defining the trait directly. executor_traits<T> = executors::type_class::instance<executors::type_class::tag<T>> By default executors::type_class::tag<T> is the same as type<T> instead of the type T itself. This in needed to ensure that the associated tag is copyable and is very cheep to copy. The main difference with Boost.Hana is that here the tag depends on the type class and in Hana it is a global tag associated to a type (data_type). I use a namespace for each concept/type class that needs to be customized. E.g for the executor concept we could have namespace executors { struct type_class { template <class Tag> struct instance; template <class T> struct tag { using type = type<T>; }; }; } ... We could have a default definition for executors::type_class::instance<Tag> if we don't need explicit mapping. However, as Hana, I use to define what Hana and Haskel calls Minimum Complete Definition (mcd) (related to lowering in the proposal). In this case, a mcd is based on the definition of ex.async_execute(), so we could have namespace executors { struct async_execute_mcd {...}; struct type_class { template <class Tag> struct instance : async_execute_mcd {}; } } Having a common schema to define the traits allows to define other common traits as concept_instance_t<executors::type_class, Ex> models<Ex, executors::type_class> I use to place the operations associated with a type class in the same namespace. This is not a requirement, but helps to avoid name collision. namespace executors { template <class Ex, class F, class Instance=concept_instance_t<executors::traits, Ex>> auto execute(Ex& ex, F&& f) -> decltype(Instance::execute(ex, forward<F>(f))) { return Instance::execute(ex, forward<F>(f)); } ... Whether execute would merits to go one level up and move to the parent namespace is subject to discussion, as would be having an alias for executor_instance<T> = concept_instance_t<executors::type_class,T> Best, Vicente

Vicente J. Botet Escriba

16 Oct 16 Oct

12:02 a.m.

New subject: We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

I forget to cross-post. Adding c++std-parallel Le 14/10/15 07:56, Vicente J. Botet Escriba a écrit :

...

Le 13/10/15 04:02, Vicente J. Botet Escriba a écrit :

...
Le 12/10/15 14:52, Hartmut Kaiser a écrit :

...
Sorry for cross-posting.

Our group has outlined our current understanding and a possible approach to this here [1]. I'd like for this to be understood as a seed for a wider discussion. Needless to say, I'd very much like to collaborate on this with anybody interested in joining the effort. I'll read it and come back to you. Hi,

In general I like the global direction of p0058r0, but have some concerns respect to the form. This will be longer than I expected.

I like:

* the fact that we are conceptualizing the interface and allowing to customize it

* the additional bulk and synchronous interface execute.

* the way async_execute/when_all/when_any can deduce the returned future once we have an executor parameter (even if I would preferred to associate them to an execution policy - see below).

* the chaining when_all_execute_and_select

* the future cast that can be used when the single thing important is if the task has completed.

* rebind, this type trait should be make generic, and customizable by the user rebind<par, MyExecutor> rebind<future<int>, string>

* the fact that then() continuations consume value types and not on futures (this is in line to my proposed next() function). have you added some kind of recover continuations (like my recover or catch_error)?

* the on(ex).with(p) chaining style (I have used it in order to schedule timing operations in Boost.Thread)

submit(sch.on(ex).at(tp), f);

* we can retrieve the wrapped type,

I like less:

* the fact that executors are aware of futures, I like the split of responsibilities, executors have void() work to schedule, and we have a free function like async/spaw/submit that returns a future. The question is which future should return async(ex, ...). p0058r0 propose async_execute that deduces the future from the executor. I believe that in in the same way the execution policy embodies a set of the rules about where, when and how to run a submitted function object, it should have associated also how the asynchronous result is reported, that is, which specific future must be used as result of async; when_all,when_any.

IMHO, futures depend on executors, not the opposite. I don't know how to implement an executor that can return my special future. However I know how to implement a future that can store a specific Executor.

* I don't know if then_execute result should depend on the future associated to the executor (or execution policy) as I expect the same kind of future as a result.

* the name value_type to retrieve the wrapped type. I' don't know if value_type is the most appropriated when we have future<T&>. ValueType as defined in the Range proposal removes references and cv qualifications.

* The name future_traits and the fact that future_traits takes a specific class as template parameter and not a type constructor (a high-order meta-function that transforms types on types). IMO what we are mapping is not std::future<T>, but std::future or std::future<_>. Given a type future<int>, it is useful to have its type_constructor. E.g. type_constructor<future<T>> is future<_>

type_constructor<apply<TC, T>> is TC value_type<apply<TC, T>> is T

As a type constructor future<_> apply<future<_>, string> is the same as future<string>. While apply<future<_>, string> and rebind<future<int>, string> seems similar, apply can be used with any high order meta-function as e.g. apply<lift<future>, string>.

I use rebind when you have an instance of a class, as future<int>, optional<int>, and apply is used when you have a type constructor as future, optional.

rebind can be defined in function of apply and type_constructor.

rebind<X,T> = apply<type_constructor<X>,T>

E.g. if we had a future that takes two parameters T and E (as expected does), the type constructor (respect to T) would be future<_, E>.

* wondering if the same applies to execution policies. Could we consider that a execution policy wraps an executor?

* I need to think about the separation of the execution_policy and the executor. Is the executor copyable? I see that executor policy provides a function to get a reference, but has the executor a reference to its policy? What are the lifetimes of both? executor_type& executor();

* functions having almost the same prototype but behaving quite differently. I see that you propose par(task) policy and that a function can return a future or not depending on the policy (par(task)). I like to use different names when the functions must be used following a different protocol. Do you have an example of an algorithm that is common independently of whether the policy is par or par(task)?

* The cumbersome generic interface I believe in general that we need two different interfaces, the user interface and the customization interface. The customization interface is often less friendly than the user interface. The executor_traits interface is for me one way to customize an interface. Other alternatives are also possible (see below)

At the user level, the following example

Iterator for_each_n(random_access_iterator_tag, ExecutionPolicy&& policy, InputIterator first, Size n, Function f) { using executor_type = typename decay_t<ExecutionPolicy>:::executor_type; executor_traits<executor_type>::execute(policy.executor(), [=](auto idx) { f(first[idx]); }, n ); }

seems more cumbersome than something more direct like

Iterator for_each_n(random_access_iterator_tag, ExecutorPolicy&& policy, InputIterator first, Size n, Function f) { execute(policy.executor(), [=](auto idx) { f(first[idx]); }, n); }

The interface for the user could

future_result_type_t<Executor, F> execute(Executor&, F&&, Args...); future_result_type_t<Executor, F> async_execute(Executor&, F&&, Args...);

future_result_type_t<Executor, F> execute_n(Executor&, size_t, F&&, Args...); future_result_type_t<Executor, F> async_execute_n(Executor&, size_t, F&&, Args...);

Note that the interface allows to pass some information to the task to execute. Note that the bulk versions have a different name as these functions do something different. How to combine the index with the Args can be discussed, but I believe that passing the Index as first parameter of the continuation is a good compromise.

However the executor customized interface doesn't needs the Args parameters, as user functions would pack F and Args to make a void(void)/void(size_t) schedulable work.

Another cumbersome example

using executor_type = typename decay_t<ExecutionPolicy>:::executor_type; return executor_traits<executor_type>::make_future_ready(policy.executor());

or

return future_traits<future<int>>::make_ready();

Compare this with a more user friendly

return make_future_ready(policy.executor());

or

return make<future>();

which of course should be equivalent to the previous code fragment.

I'm working on a on-going factories proposal that would allow make<future>();

BTW, the following function is missing from executor_traits as well as make_exceptional_future.

static future<void> make_ready_future(executor_type& ex);

* What do you think of using overload and flat type traits in order to customize the user interface instead of executor_traits as suggested by Eric? E.g. I would expect that rebind, value_type to be generic and placed at the std level. Other traits are more specific like executor_type, execution_category, ...

* Inspired from Boost.Hana and Haskell I have been customizing some type classes following the following pattern pattern. It is quite close to the trait approach, however, I use an additional level of indirection via a tag type trait that allow to dispatch to a common model instead of defining the trait directly.

executor_traits<T> = executors::type_class::instance<executors::type_class::tag<T>>

By default executors::type_class::tag<T> is the same as type<T> instead of the type T itself. This in needed to ensure that the associated tag is copyable and is very cheep to copy.

The main difference with Boost.Hana is that here the tag depends on the type class and in Hana it is a global tag associated to a type (data_type).

I use a namespace for each concept/type class that needs to be customized. E.g for the executor concept we could have

namespace executors { struct type_class { template <class Tag> struct instance;

template <class T> struct tag { using type = type<T>; }; }; }

...

We could have a default definition for executors::type_class::instance<Tag> if we don't need explicit mapping. However, as Hana, I use to define what Hana and Haskel calls Minimum Complete Definition (mcd) (related to lowering in the proposal). In this case, a mcd is based on the definition of ex.async_execute(), so we could have

namespace executors { struct async_execute_mcd {...}; struct type_class { template <class Tag> struct instance : async_execute_mcd {}; } }

Having a common schema to define the traits allows to define other common traits as

concept_instance_t<executors::type_class, Ex>

models<Ex, executors::type_class>

I use to place the operations associated with a type class in the same namespace. This is not a requirement, but helps to avoid name collision.

namespace executors { template <class Ex, class F, class Instance=concept_instance_t<executors::traits, Ex>> auto execute(Ex& ex, F&& f) -> decltype(Instance::execute(ex, forward<F>(f))) { return Instance::execute(ex, forward<F>(f)); } ...

Whether execute would merits to go one level up and move to the parent namespace is subject to discussion, as would be having an alias for

executor_instance<T> = concept_instance_t<executors::type_class,T>

Best, Vicente

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

14 Oct 14 Oct

2:54 p.m.

New subject: [c++std-parallel-2019] We need a coherent higher level parallelization story for C++ (was [thread] Is there a non-blocking future-destructor?)

Hartmut Kaiser wrote:

...

FWIW, the design decision to let those (and only those) futures block on destruction which are returned from async was one of the really bad decisions made for C++11, however that's just my opinion (others agree, but yet others disagree).

I agree completely. The correct thing to do would have been to introduce a separate class, I'll call it barrier here, although this is perhaps not ideal, which takes care of lifetime issues by blocking in its destructor, like this: X x; Y y; barrier b; auto f1 = async( b, [&]{ x.f( y, 1 ); } ); auto f2 = async( b, [&]{ x.f( y, 2 ); } );

3584

Age (days ago)

3588

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Hartmut Kaiser
Mikael Olenfalk
Oswin Krause
Peter Dimov
Vicente J. Botet Escriba