Hey list, These are just a few remarks I wanted to share that I made during the compute review phase. Unfortunately, I don't have the time to write a full review. As I mostly skimmed over everything so take it with a grain of salt. First of all, I really like the general idea of Boost.Compute. That is, I like the idea of having a proper API for OpenCL capable devices and I like the idea of having parallel algorithms. With that being said, i think it is a waste of opportunities to have the two tied together so closely especially considering N4104 with the combination of different execution policies which would allow for a wonderful API for using different parallelization backends (where boost.compute could be the reference implementation or some such). Additionally, I think that the author and other reviewers are running in circles when it comes to synchronization. IMHO, the event alone is enough and perfectly fine. An OpenCL event has interesting similarities to std::future<void>, and the clEnqueue functions to std::async. I'd consider the command queue however similar to std::thread (and threads should be considered harmful...). With that being said, a event should be perfectly fine to express synchronization points. Together with executors, it would even fit into the more higher level stdlib parallel algorithms. It would be very nice to get rid of a publicly exposed event class though and have one future to rule them all (one can dream...). But having a future and event in Boost.compute is, IMHO, not a good idea. Cheers, Thomas
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Thomas Heller Sent: Thursday, January 01, 2015 13:56 To: boost@lists.boost.org Subject: [boost] [compute] Some remarks
Additionally, I think that the author and other reviewers are running in circles when it comes to synchronization. IMHO, the event alone is enough and perfectly fine.
It's true that the library could have been designed to rely solely on events. All operations could accept wait_lists and return either an event or a wait_list (for those algorithms with multiple outputs). Since it wasn't done in that way (and I don't mean to imply any judgment, here; I recognize there are benefits to the current design), we're in a position of looking to plug holes in the existing design to enable full exception safety and reduce the set of usage errors that can occur. Matt ________________________________ This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.
On Friday, January 02, 2015 00:03:37 Gruenke, Matt wrote:
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Thomas Heller Sent: Thursday, January 01, 2015 13:56 To: boost@lists.boost.org Subject: [boost] [compute] Some remarks
Additionally, I think that the author and other reviewers are running in circles when it comes to synchronization. IMHO, the event alone is enough and perfectly fine.
It's true that the library could have been designed to rely solely on events. All operations could accept wait_lists and return either an event or a wait_list (for those algorithms with multiple outputs).
Since it wasn't done in that way (and I don't mean to imply any judgment, here; I recognize there are benefits to the current design), we're in a position of looking to plug holes in the existing design to enable full exception safety and reduce the set of usage errors that can occur.
Well, that's exactly what I am trying to say ... The current design of the library completely disregards the research that has been done to support asynchronous operations. We have std::future (which is almost equivalent to a OpenCL event), why not use the same mechanisms here? OpenCL event supports dataflow like statements, which can be expressed with futures just as well. With the when_XXX and wait_XXX functions as proposed, even wait_list becomes useless, you'd also gain the ability to mix and match various different "futures" (by different futures, I mean futures with different shared_state implementations). This gives you almost perfect composability and exception safety. This allows to construct algorithms which are tightly coupled, yet are able to easily overlap different phases of computation. Anything that is implicitly hidden behind data structures or algorithms seems like a bad idea to me as you might have to be overly conservative which goes against the "don't pay for what you don't need" principle.
Matt
________________________________
This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Thomas Heller
Well, that's exactly what I am trying to say ... The current design of the library completely disregards the research that has been done to support asynchronous operations. We have std::future (which is almost equivalent to a OpenCL event), why not use the same mechanisms here?
This is something Joel tries to convince me of but I'm resisting. Could you shed some light on how events are almost equivalent to futures? Futures store the result of the asynchronous computation. Events are markers that can be queried to find out an operation finished and can be blocked on until an operation is finished. The data however is stored somewhere else. Futures are in this sense safer abstractions as they prevent users from accessing results that are not yet finished. That is my understanding of futures, I might be wrong here, please correct me if I am. So I consider futures and events orthogonal concepts. One can be, with some effort and loss of expressiveness, changed to the other concept and vice versa. But I'm not sure if the code makes sense after the change. Consider these examples: future<void> f = copy_async(src, dst); fill_async(dst, 42.); This does not work, a dependency or dataflow graph has to be created between copy and fill, so: future<void> f = copy_async(src, dst); fill_async(dst, 42., f); But that is not a future, that is an event. How to write this with futures? I think it should be this but I might be wrong: futuredst::iterator f = copy_async(src, dst); fill_async(f, 42); Is this correct? Now everything is a future, is it not? Another alternative is to hide futures in the containers/ranges/iterators and let the do the right thing implicitly. This is what NT2 [0] does afaik. In my library [1] I have feed (equivalent to command_queues) and mark (equivalent to events) types so I can write code like this: device d(0); feed f1(d); feed f2(d); mark m1(f1); mark m2(f2); wait_for(m1); // block calling thread f1.continue_when(m2); // block feed until other feed reached mark and I'm trying to get rid of this and use futures. But it makes no sense without making everything a future. Best Regards, Sebastian
Thomas Heller
writes: Well, that's exactly what I am trying to say ... The current design of the library completely disregards the research that has been done to support asynchronous operations. We have std::future (which is almost equivalent to a OpenCL event), why not use the same mechanisms here?
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
This is something Joel tries to convince me of but I'm resisting. Could you shed some light on how events are almost equivalent to futures? Futures store the result of the asynchronous computation. Events are markers that can be queried to find out an operation finished and can be blocked on until an operation is finished. The data however is stored somewhere else. Futures are in this sense safer abstractions as they prevent users from accessing results that are not yet finished. That is my understanding of futures, I might be wrong here, please correct me if I am.
So I consider futures and events orthogonal concepts. One can be, with some effort and loss of expressiveness, changed to the other concept and vice versa. But I'm not sure if the code makes sense after the change. Consider these examples:
future<void> f = copy_async(src, dst); fill_async(dst, 42.);
This does not work, a dependency or dataflow graph has to be created between copy and fill, so:
future<void> f = copy_async(src, dst); fill_async(dst, 42., f);
What about: future<void> f = copy_async(src, dst); f.then([](future<void>&& f) { fill_async(dst, 42.); }) or (assuming await will be available, which almost all of the committee thinks is something we need): await copy_async(src, dst); fill_async(dst, 42.); i.e. the code looks 'normal' but is fully asynchronous thanks to await and futures.
But that is not a future, that is an event. How to write this with futures?
I think it should be this but I might be wrong:
futuredst::iterator f = copy_async(src, dst); fill_async(f, 42);
You're right that an event is separating the fact that data is available from the data itself. Well, the opencl guys decided that this is the right way of doing things. I really hope that we know better. Just because the underlying opencl API exposes the trigger and the data separately does not imply that we have to do the same thing in the API exposed from our libraries. At the same time and as you already mentioned, future<void> is perfectly well usable for representing even this use case.
Is this correct? Now everything is a future, is it not? Another alternative is to hide futures in the containers/ranges/iterators and let the do the right thing implicitly. This is what NT2 [0] does afaik.
Joel adopted this in NT2 based on ideas from HPX [1], btw.
In my library [1] I have feed (equivalent to command_queues) and mark (equivalent to events) types so I can write code like this:
device d(0); feed f1(d); feed f2(d); mark m1(f1); mark m2(f2); wait_for(m1); // block calling thread f1.continue_when(m2); // block feed until other feed reached mark
and I'm trying to get rid of this and use futures. But it makes no sense without making everything a future.
What do you mean by 'making everything a future'? Having all functions return futures? If so - then yes - if you want to make a function asynchronously callable, let it return a future. There is nothing wrong with that (well, except that std::future is utterly bulky and slow as it is usually tied to std::sthread which in turn is usually representing kernel threads - for a proposed solution see my talk at MeetingC++ 2014 [2]). [1] https://github.com/STEllAR-GROUP/hpx [2] https://www.youtube.com/watch?v=4OCUEgSNIAY Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
On 03/01/2015 14:15, Hartmut Kaiser wrote:
What about:
future<void> f = copy_async(src, dst); f.then([](future<void>&& f) { fill_async(dst, 42.); })
Just to clarify, I believe this is the model that HPX implements. NT2, Joel Falcou's library that Sebastian also mentioned, uses this model. This abstraction can also be built on top of OpenMP 4.0 or TBB.
Keep in mind that Boost.Compute needs synchronization support to facilitate exception safety. I don't know if any type of futures can provide that, but its own futures don't. Ideally, you'd also want to express the data dependencies to the OpenCL C layer, to facilitate out-of-order kernel execution (whether using an out-of-order queue or multiple in-order queues). To that end... -----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser Sent: Saturday, January 03, 2015 8:16 To: boost@lists.boost.org Subject: Re: [boost] [compute] Some remarks
Thomas Heller
writes:
Another alternative is to hide futures in the containers/ranges/iterators and let the do the right thing implicitly. This is what NT2 [0] does afaik.
Joel adopted this in NT2 based on ideas from HPX [1], btw.
...this sounds like a good way to go, so long as the embedded events are accessible to the Boost.Compute core, from which it can construct wait_lists to pass into the OpenCL C interface. I believe Kyle is looking at doing something like this (maybe not the wait_lists, but at least making the device memory containers' destructors block on the corresponding events). Matt ________________________________ This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.
On 3 Jan 2015 at 7:15, Hartmut Kaiser wrote:
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
They are also severely limited and limiting: 1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design. 2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is. 3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow. 4. The compiler's optimiser really struggles to do much with the current future design because of all the implicit visibility to other threads. Even a very simple use of future requires hundreds of CPU instructions to be generated as a minimum, none of which can be elided because the compiler can't know visibility effects to other threads. I'll grant you that a HPX type design makes this problem much more tractable because the real problem here is the potential presence of hardware concurrency. This is why Chris has proposed async_result from ASIO instead, that lets the caller of an async API supply the synchronisation method to be used for that particular call. async_result is superior to futures in all but one extremely important way: async_result cannot traverse an ABI boundary, while futures can.
What do you mean by 'making everything a future'? Having all functions return futures? If so - then yes - if you want to make a function asynchronously callable, let it return a future. There is nothing wrong with that (well, except that std::future is utterly bulky and slow as it is usually tied to std::sthread which in turn is usually representing kernel threads - for a proposed solution see my talk at MeetingC++ 2014 [2]).
For the record, I'd just love if there were more HPX type thinking in how C++ concurrency is standardised. However, I have learned with age and experience that people don't care much for whole new ways of thinking and approaching problems. They prefer some small incremental library which can be tacked onto their existing code without much conceptual change. To that end, when facing the limitations of std::future they can see the cost-benefit of boost:future, and can conceptualise replacing std::future with boost::future in their code. So that is a viable mental step for them. Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off. To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Sunday, January 04, 2015 06:52:41 Niall Douglas wrote:
On 3 Jan 2015 at 7:15, Hartmut Kaiser wrote:
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
They are also severely limited and limiting:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands. FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well.
2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is.
I am not sure i fully understand that statement. All I read is that a particular implementation seems to be bad and you project this to the general design decision. I would like to see this SHA future code though and experiment with it a bit.
3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow.
I think this is a void statement. You always have to deal with exceptions in one way or another ... But yes, exception handling is slow, so what? It's only happening in exceptional circumstances, what's the problem here?
4. The compiler's optimiser really struggles to do much with the current future design because of all the implicit visibility to other threads. Even a very simple use of future requires hundreds of CPU instructions to be generated as a minimum, none of which can be elided because the compiler can't know visibility effects to other threads. I'll grant you that a HPX type design makes this problem much more tractable because the real problem here is the potential presence of hardware concurrency.
Which is still there, even in HPX :P Intra thread communication is expensive won current architectures irregardless of what higher level abstraction, what's the point?
This is why Chris has proposed async_result from ASIO instead, that lets the caller of an async API supply the synchronisation method to be used for that particular call. async_result is superior to futures in all but one extremely important way: async_result cannot traverse an ABI boundary, while futures can.
What's the difference between async_result and a future? I am unable to find that in the ASIO documentation.
What do you mean by 'making everything a future'? Having all functions return futures? If so - then yes - if you want to make a function asynchronously callable, let it return a future. There is nothing wrong with that (well, except that std::future is utterly bulky and slow as it is usually tied to std::sthread which in turn is usually representing kernel threads - for a proposed solution see my talk at MeetingC++ 2014 [2]). For the record, I'd just love if there were more HPX type thinking in how C++ concurrency is standardised.
However, I have learned with age and experience that people don't care much for whole new ways of thinking and approaching problems. They prefer some small incremental library which can be tacked onto their existing code without much conceptual change. To that end, when facing the limitations of std::future they can see the cost-benefit of boost:future, and can conceptualise replacing std::future with boost::future in their code. So that is a viable mental step for them.
Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off.
Neither me or Hartmut are proposing to use HPX within boost. However, we want to release a HPX-enhanced C++ stdlib in the near future to account for this exact deficiency.
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
I missed that. Can you link the source/documentation/proposal once more please?
Niall
On 01/04/2015 11:25 AM, Thomas Heller wrote:
What's the difference between async_result and a future? I am unable to find that in the ASIO documentation.
async_result is like a return type trait. In its generalized form in Asio it supports callbacks, but has be specialized for futures and coroutines. Other mechanisms can be added; for instance, Boost.Fiber comes with an async_result specialization. The important point with that design is that with async_result the user of an asynchronous function gets to decide which mechanism to use. Examples: // Callback socket.async_send(buffer, [] (error_code, size_t) { }); // Future auto myfuture = socket.async_send(buffer, use_future); // Coroutine socket.async_send(buffer, yield);
On Sunday, January 04, 2015 12:46:26 Bjorn Reese wrote:
On 01/04/2015 11:25 AM, Thomas Heller wrote:
What's the difference between async_result and a future? I am unable to find that in the ASIO documentation.
async_result is like a return type trait. In its generalized form in Asio it supports callbacks, but has be specialized for futures and coroutines. Other mechanisms can be added; for instance, Boost.Fiber comes with an async_result specialization.
The important point with that design is that with async_result the user of an asynchronous function gets to decide which mechanism to use.
Examples:
// Callback socket.async_send(buffer, [] (error_code, size_t) { });
// Future auto myfuture = socket.async_send(buffer, use_future);
// Coroutine socket.async_send(buffer, yield);
Interesting. Let me try to explain what I am seeing in the future concept first and try to translate that async_result: For me, the concept behind futures is twofold, you have the future<T> on the *receiving* side which is, apart from the return type of the async operation, a completely type erased (as in type of asynchronous operation) handle to a future result. Things like promise, packaged_task or whatnot represent the *sending* side of the operation. That means, the only difference here is how the future result is being computed (locally, remotely or on a OpenCL device). This mechanism is completely hidden from the user, and currently a implementation detail of the various different libraries (with HPX being the exception here). And I believe this is a good thing. async_result seems to let the user decide how she would like to have the result produced, at least it looks like it ... after looking at the implementation it's "just" a trait from ASIO's async model to whichever you prefer. That could be done with different future islands just as well. The important part, for me, *how* the is being produced is not covered. In addition, it doesn't solve the problem of how to compose different future-like types or async concepts. Cheers, Thomas
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Sunday, January 04, 2015 13:37:22 you wrote:
On Sunday, January 04, 2015 12:46:26 Bjorn Reese wrote:
On 01/04/2015 11:25 AM, Thomas Heller wrote:
What's the difference between async_result and a future? I am unable to find that in the ASIO documentation.
async_result is like a return type trait. In its generalized form in Asio it supports callbacks, but has be specialized for futures and coroutines. Other mechanisms can be added; for instance, Boost.Fiber comes with an async_result specialization.
The important point with that design is that with async_result the user of an asynchronous function gets to decide which mechanism to use.
Examples: // Callback socket.async_send(buffer, [] (error_code, size_t) { });
// Future auto myfuture = socket.async_send(buffer, use_future);
// Coroutine socket.async_send(buffer, yield);
Interesting. Let me try to explain what I am seeing in the future concept first and try to translate that async_result: For me, the concept behind futures is twofold, you have the future<T> on the *receiving* side which is, apart from the return type of the async operation, a completely type erased (as in type of asynchronous operation) handle to a future result. Things like promise, packaged_task or whatnot represent the *sending* side of the operation. That means, the only difference here is how the future result is being computed (locally, remotely or on a OpenCL device). This mechanism is completely hidden from the user, and currently a implementation detail of the various different libraries (with HPX being the exception here). And I believe this is a good thing. async_result seems to let the user decide how she would like to have the result produced, at least it looks like it ... after looking at the implementation it's "just" a trait from ASIO's async model to whichever you prefer. That could be done with different future islands just as well. The important part, for me, *how* the is being produced is not covered. In addition, it doesn't solve the problem of how to compose different future-like types or async concepts.
One more addition: The future concept with *how*, *where* and *when* tasks will get executed gets enhanced even further, and the executor can implement its own shared state, but the composability with other futures is still provided.
Cheers, Thomas
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Thomas Heller
Interesting. Let me try to explain what I am seeing in the future concept first and try to translate that async_result: For me, the concept behind futures is twofold, you have the future<T> on the *receiving* side which is, apart from the return type of the async operation, a completely type erased (as in type of asynchronous operation) handle to a future result. Things like promise, packaged_task or whatnot represent the *sending* side of the operation.
I'm really happy about this discussion here. Can you clarify what you mean by a future being type erased? You don't mean to say a future is a future<any>, do you?
On 04/01/2015 16:55, Sebastian Schaetz wrote:
Thomas Heller
writes: Interesting. Let me try to explain what I am seeing in the future concept first and try to translate that async_result: For me, the concept behind futures is twofold, you have the future<T> on the *receiving* side which is, apart from the return type of the async operation, a completely type erased (as in type of asynchronous operation) handle to a future result. Things like promise, packaged_task or whatnot represent the *sending* side of the operation.
I'm really happy about this discussion here. Can you clarify what you mean by a future being type erased? You don't mean to say a future is a future<any>, do you?
Presumably that the type of the future does not contain the type of the callable function object it contains, thus requiring dynamic memory allocation and indirection.
On 5 Jan 2015 at 0:01, Mathias Gaunard wrote:
I'm really happy about this discussion here. Can you clarify what you mean by a future being type erased? You don't mean to say a future is a future<any>, do you?
Presumably that the type of the future does not contain the type of the callable function object it contains, thus requiring dynamic memory allocation and indirection.
An interesting mind binder is if one makes ASIO's async_result vtabled i.e. a listener class. That would make async_result ABI stable, and eliminate most of the rationale for improving futures, arguably even the present approach by Microsoft and the committee on how to do resumable functions. I am personally surprised that Chris hasn't proposed this yet in one of this N-papers proposing the ASIO way of doing async instead of the current approach by the committee :) Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
I'm really happy about this discussion here. Can you clarify what you mean by a future being type erased? You don't mean to say a future is a future<any>, do you?
Presumably that the type of the future does not contain the type of the callable function object it contains, thus requiring dynamic memory allocation and indirection.
An interesting mind binder is if one makes ASIO's async_result vtabled i.e. a listener class.
That would make async_result ABI stable, and eliminate most of the rationale for improving futures, arguably even the present approach by Microsoft and the committee on how to do resumable functions.
I am personally surprised that Chris hasn't proposed this yet in one of this N-papers proposing the ASIO way of doing async instead of the current approach by the committee :)
Again, I don't see the 'current way' and 'Chris' way' as contradicting. They are orthogonal and address different things. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
On Monday, January 05, 2015 00:01:57 Mathias Gaunard wrote:
On 04/01/2015 16:55, Sebastian Schaetz wrote:
Thomas Heller
writes: Interesting. Let me try to explain what I am seeing in the future concept
first
and try to translate that async_result: For me, the concept behind futures is twofold, you have the future<T> on the *receiving* side which is, apart from the return type of the async operation, a completely type erased (as in type of asynchronous operation) handle to a future result. Things like promise, packaged_task or whatnot represent the *sending* side of the operation.
I'm really happy about this discussion here. Can you clarify what you mean by a future being type erased? You don't mean to say a future is a future<any>, do you?
Presumably that the type of the future does not contain the type of the callable function object it contains, thus requiring dynamic memory allocation and indirection.
Exactly. Very similar to what std::function does. Dynamic memory allocation could be eliminated with small object optimization though.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 3 Jan 2015 at 7:15, Hartmut Kaiser wrote:
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others. They are also severely limited and limiting:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design. I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a
On Sunday, January 04, 2015 06:52:41 Niall Douglas wrote: properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands. Hi Thomas. Can you share your ideas of what this unifying future interface could be? are you thinking on a dynamic or parametric (static) polymorphism? What is the minimal interface of a Future that allows to build higher and efficient abstraction on top of it? FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well. I've no doubt you have reached a good integration on your island. The
Le 04/01/15 11:25, Thomas Heller a écrit : main issue is how to be able to add new 'Futures' without been forced to modify existing code or making use of internals of a specific island and take advantage of other higher level mechanism based on 'Generic Futures'. Best, Vicente
On Sunday, January 04, 2015 17:03:48 Vicente J. Botet Escriba wrote:
Le 04/01/15 11:25, Thomas Heller a écrit :
On Sunday, January 04, 2015 06:52:41 Niall Douglas wrote:
On 3 Jan 2015 at 7:15, Hartmut Kaiser wrote:
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
They are also severely limited and limiting:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands. Hi Thomas. Can you share your ideas of what this unifying future interface could be? are you thinking on a dynamic or parametric (static) polymorphism? What is the minimal interface of a Future that allows to build higher and efficient abstraction on top of it?
FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well.
I've no doubt you have reached a good integration on your island. The main issue is how to be able to add new 'Futures' without been forced to modify existing code or making use of internals of a specific island and take advantage of other higher level mechanism based on 'Generic Futures'.
Let's try it. Let's try to summarize the landscape of what we currently have and formulate a list of requirements. There are various C++ standard proposal and a TS dealing with that issue, The C++ standard I am referring to in the following is the current working draft N4296[1]. What we currently see is a fragmentation of the landscape into different future islands. So, what we have in the standard is the basic specification of a shared state which is used as a communication channel between an asynchronous return object and an asynchronous provider (30.6.4). The standard furthermore defines 3 providers (promise, packaged_task and async), and two return objects (future and shared_future). What's particularly interesting here is that 30.6.4 doesn't speak about threads or synchronization between threads. For me, the reason why we have the different future islands is because of the fact that the shared state doesn't have a specified API and customization points. If that would be the case, the async return objects can hook onto that API to generate a unified future and shared_future while the providers have every freedom to generate asynchronous results. The only open question is how this shared_state should look like. A specific future island might have problems using the already existing providers like promise, packaged_task and async because they don't want to use the stdlib implementation provided thread and thread synchronization primitives for one reason or another. This might be because they are unsuitable (HPX and Boost.Fiber) or because they adapt a 3rd party library like OpenCL which provide their own shared state implement (an OpenCL event is kinda like a shared state!). The big question now is: How could such a shared_state API look like? To be honest, I have no idea. What i believe though is that the reference counting specification should be absolutely required due to various reasons like shared_future or keeping the communication channel alive in the event of an asynchronous task. The providers like async, when_XXX and future<T>::then should be considered in this discussion as well because it is unclear which type of shared_state they should create. This can be solved via executors. Yet, I am not sure if the current state of the executor proposal is able to cope with my ideas here. Please don't hesitate to complete the list or point to obvious mistakes. Please also note that some of things stated above are kept vague because I don't have a solution to it or I think they require further discussion. [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
Best, Vicente
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Monday, January 05, 2015 15:47:12 you wrote:
On Sunday, January 04, 2015 17:03:48 Vicente J. Botet Escriba wrote:
Le 04/01/15 11:25, Thomas Heller a écrit :
On Sunday, January 04, 2015 06:52:41 Niall Douglas wrote:
On 3 Jan 2015 at 7:15, Hartmut Kaiser wrote:
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
They are also severely limited and limiting:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands.
Hi Thomas. Can you share your ideas of what this unifying future interface could be? are you thinking on a dynamic or parametric (static) polymorphism? What is the minimal interface of a Future that allows to build higher and efficient abstraction on top of it?
FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well.
I've no doubt you have reached a good integration on your island. The main issue is how to be able to add new 'Futures' without been forced to modify existing code or making use of internals of a specific island and take advantage of other higher level mechanism based on 'Generic Futures'.
Replying to myself as i forgot some things...
Let's try it. Let's try to summarize the landscape of what we currently have and formulate a list of requirements. There are various C++ standard proposal and a TS dealing with that issue, The C++ standard I am referring to in the following is the current working draft N4296[1]. What we currently see is a fragmentation of the landscape into different future islands. So, what we have in the standard is the basic specification of a shared state which is used as a communication channel between an asynchronous return object and an asynchronous provider (30.6.4). The standard furthermore defines 3 providers (promise, packaged_task and async), and two return objects (future and shared_future). What's particularly interesting here is that 30.6.4 doesn't speak about threads or synchronization between threads. For me, the reason why we have the different future islands is because of the fact that the shared state doesn't have a specified API and customization points. If that would be the case, the async return objects can hook onto that API to generate a unified future and shared_future while the providers have every freedom to generate asynchronous results. The only open question is how this shared_state should look like. A specific future island might have problems using the already existing providers like promise, packaged_task and async because they don't want to use the stdlib implementation provided thread and thread synchronization primitives for one reason or another. This might be because they are unsuitable (HPX and Boost.Fiber) or because they adapt a 3rd party library like OpenCL which provide their own shared state implement (an OpenCL event is kinda like a shared state!).
The big question now is: How could such a shared_state API look like? To be honest, I have no idea. What i believe though is that the reference counting specification should be absolutely required due to various reasons like shared_future or keeping the communication channel alive in the event of an asynchronous task.
We currently have something like this implemented in HPX though, but that's not publicly exposed or documented [2]. However, there is one thing that this implementation didn't solve so far and that's the ability to retrieve the specific type of the shared_state once it's turned into a future. That's needed for an OpenCL backend to use the OpenCL in the enqueueRangeND functions to make them useful in the OpenCL context again ...
The providers like async, when_XXX and future<T>::then should be considered in this discussion as well because it is unclear which type of shared_state they should create. This can be solved via executors. Yet, I am not sure if the current state of the executor proposal is able to cope with my ideas here.
So i forgot about the requirements I think we need (no specific order): - An asynchronous providers need to have the ability to spawn and synchronize their asynchronous as they wish - Asynchronous providers, even of different type need to be composable. - Ability to switch between different asynchronous return objects (for example shared_future to future). where it makes sense - Generic asynchronous providers like resumable functions, async and dataflow need the ability to have an execution policy so that different execution engines can be chosen accordingly. - asynchronous return object composition needs have either some kind of conflict resolution, or an associated context so that they know which kind of shared state they need to provide. - Ability to avoid exceptions being thrown. I hope that's all now ...
Please don't hesitate to complete the list or point to obvious mistakes. Please also note that some of things stated above are kept vague because I don't have a solution to it or I think they require further discussion.
[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
[2]; https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/lcos/detail/future_data...
Best, Vicente
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Thomas Heller Sent: 05 January 2015 15:30 To: boost@lists.boost.org Subject: Re: [boost] Futures
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Thomas Heller Sent: January 05, 2015 10:30 To: boost@lists.boost.org Subject: Re: [boost] Futures
However, there is one thing that this implementation didn't solve so far and that's the ability to retrieve the specific type of the shared_state once it's turned into a future. That's needed for an OpenCL backend to use the OpenCL in the enqueueRangeND functions to make them useful in the OpenCL context again ...
If you're talking about passing a future as a parameter of an operation, so it can be included as part of a wait_list, then I fully agree. What any good OpenCL wrapper should support is explicit expression of dependencies to the OpenCL driver, rather than implicitly enforcing data dependencies through host-based synchronization. Even if this is only possible by deriving from a standard future type, of some sort, I think that would still be preferable to introducing its own completely custom type. Matt ________________________________ This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.
Am 05.01.2015 19:11 schrieb "Gruenke,Matt"
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Thomas
Sent: January 05, 2015 10:30 To: boost@lists.boost.org Subject: Re: [boost] Futures
However, there is one thing that this implementation didn't solve so far and that's the ability to retrieve the specific type of the shared_state once it's turned into a future. That's needed for an OpenCL backend to use the OpenCL in the enqueueRangeND functions to make them useful in the OpenCL context again ...
If you're talking about passing a future as a parameter of an operation, so it can be included as part of a wait_list, then I fully agree. What any good OpenCL wrapper should support is explicit expression of dependencies to the OpenCL driver, rather than implicitly enforcing data dependencies
Heller through host-based synchronization. Yes and no ;) the idea is to construct the wait list behind the scenes in the opencl wrapper to avoid host based synchronization of the opencl event. On the other end however, the unified interface would allow to additionally synchronize on other future types, for example data coming over a network. This would be an extremely powerful tool to have available. I believe we have a nice and efficient proof of concept already available in HPX. Unfortunately, the pushed opencl wrapper (hpxcl) doesn't fully use the uniform future based interface yet (we are currently ironing out the last conceptual problems to also support remote opencl devices), but the essence is already there.
Even if this is only possible by deriving from a standard future type, of
some sort, I think that would still be preferable to introducing its own completely custom type.
Matt
________________________________
This e-mail contains privileged and confidential information intended for
the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.
_______________________________________________ Unsubscribe & other changes:
On 4 Jan 2015 at 11:25, Thomas Heller wrote:
I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands. FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well.
I personally think that any notion of any shared state in futures is one of the big design mistakes. Instead of "future as a shared_ptr", think "future as a pipe".
2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is.
I am not sure i fully understand that statement. All I read is that a particular implementation seems to be bad and you project this to the general design decision. I would like to see this SHA future code though and experiment with it a bit.
Have a look at https://github.com/BoostGSoC13/boost.afio/blob/content_hashing_merge/b oost/afio/hash_engine.hpp. The best I could get it to is 17 cycles a byte, with the scheduling (mostly future setup and teardown) consuming 2 cycles a byte, or a 13% overhead which I feel is unacceptable. The forthcoming hardware offloaded SHA in ARM and Intel CPUs might do 2 cycles a byte. In this situation the use of futures halves performance which is completely unacceptable.
3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow.
I think this is a void statement. You always have to deal with exceptions in one way or another ... But yes, exception handling is slow, so what? It's only happening in exceptional circumstances, what's the problem here?
No it isn't. Current futures require the compiler to generate the code for handling exception throws irrespective of whether it could ever happen or not. As a relative weight to something like a SHA round which is fundamentally noexcept, this isn't a trivial overhead especially when it's completely unnecessary.
This is why Chris has proposed async_result from ASIO instead, that lets the caller of an async API supply the synchronisation method to be used for that particular call. async_result is superior to futures in all but one extremely important way: async_result cannot traverse an ABI boundary, while futures can.
What's the difference between async_result and a future? I am unable to find that in the ASIO documentation.
As Bjorn mentioned, an async_result is a per-API policy for how to indicate the completion of an asynchronous operation. It could be as simple as an atomic boolean.
Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off.
Neither me or Hartmut are proposing to use HPX within boost. However, we want to release a HPX-enhanced C++ stdlib in the near future to account for this exact deficiency.
With respect, nobody wants nor needs yet another STL. We already have three, and that already has enough of a maintenance headache. If you can persuade one of the big three to fully adopt your enhancements then I am all ears.
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
I missed that. Can you link the source/documentation/proposal once more please?
Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022. The key insight of that proposal is the notion of static composition of continuations as the core design. One then composes, at compile-time, a sequence of continuations which implement any combination and variety of future you like, including the STL ones and the proposed Concurrency TS ones. You will note how the functional static continuations are effectively monadic, and therefore these elementary future promises are actually a library based awaitable resumable monadic toolkit which could be used to write coroutine based Hana or Expected monadic sequences which can be arbitrarily paused, resumed, or transported across threads. Universal composure of any kind of future with any other kind is possible when they share the same underlying kernel wait object. I intend to use my proposed pthreads permit object which is a portable userspace pthreads event object as that universal kernel wait object. If widely adopted, it may persuade the AWG to admit permit objects into POSIX threads for standardisation, that way C and C++ code can all use interoperable wait composure. Indeed, if POSIX threads already had the permit object, then OpenCL would have used it instead of making their custom event object, and we could then easily construct a std::future and boost::future for Compute. Sadly, the AWG don't see this sort of consequence, or rather I suspect they don't hugely care. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Monday, January 05, 2015 10:27:33 Niall Douglas wrote:
On 4 Jan 2015 at 11:25, Thomas Heller wrote:
I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands. FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well.
I personally think that any notion of any shared state in futures is one of the big design mistakes. Instead of "future as a shared_ptr", think "future as a pipe".
std::future is more like a unique_ptr, std::shared_future is the shared_ptr equivalent. In the "future as a pipe" idea, the future is merely the receiving end.
2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is.
I am not sure i fully understand that statement. All I read is that a particular implementation seems to be bad and you project this to the general design decision. I would like to see this SHA future code though and experiment with it a bit.
Have a look at https://github.com/BoostGSoC13/boost.afio/blob/content_hashing_merge/b oost/afio/hash_engine.hpp.
The best I could get it to is 17 cycles a byte, with the scheduling (mostly future setup and teardown) consuming 2 cycles a byte, or a 13% overhead which I feel is unacceptable.
The forthcoming hardware offloaded SHA in ARM and Intel CPUs might do 2 cycles a byte. In this situation the use of futures halves performance which is completely unacceptable.
3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow.
I think this is a void statement. You always have to deal with exceptions in one way or another ... But yes, exception handling is slow, so what? It's only happening in exceptional circumstances, what's the problem here?
No it isn't. Current futures require the compiler to generate the code for handling exception throws irrespective of whether it could ever happen or not. As a relative weight to something like a SHA round which is fundamentally noexcept, this isn't a trivial overhead especially when it's completely unnecessary.
Ok. Hands down: What's the associated overhead you are talking about? Do you have exact numbers?
This is why Chris has proposed async_result from ASIO instead, that lets the caller of an async API supply the synchronisation method to be used for that particular call. async_result is superior to futures in all but one extremely important way: async_result cannot traverse an ABI boundary, while futures can.
What's the difference between async_result and a future? I am unable to find that in the ASIO documentation.
As Bjorn mentioned, an async_result is a per-API policy for how to indicate the completion of an asynchronous operation. It could be as simple as an atomic boolean.
The problem with async_result (as mentioned in a different post) is that it merely takes care of "transporting" from the ASIO future island to another one. It can be just as well be adapted to any other future based system.
Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off.
Neither me or Hartmut are proposing to use HPX within boost. However, we want to release a HPX-enhanced C++ stdlib in the near future to account for this exact deficiency.
With respect, nobody wants nor needs yet another STL. We already have three, and that already has enough of a maintenance headache.
If you can persuade one of the big three to fully adopt your enhancements then I am all ears.
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
I missed that. Can you link the source/documentation/proposal once more please?
Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022. The key insight of that proposal is the notion of static composition of continuations as the core design. One then composes, at compile-time, a sequence of continuations which implement any combination and variety of future you like, including the STL ones and the proposed Concurrency TS ones. You will note how the functional static continuations are effectively monadic, and therefore these elementary future promises are actually a library based awaitable resumable monadic toolkit which could be used to write coroutine based Hana or Expected monadic sequences which can be arbitrarily paused, resumed, or transported across threads.
This looks indeed promising. I think we should further investigate how this could be used when dealing with truly asynchronous and concurrently executed tasks.
Universal composure of any kind of future with any other kind is possible when they share the same underlying kernel wait object. I intend to use my proposed pthreads permit object which is a portable userspace pthreads event object as that universal kernel wait object. If widely adopted, it may persuade the AWG to admit permit objects into POSIX threads for standardisation, that way C and C++ code can all use interoperable wait composure.
Indeed, if POSIX threads already had the permit object, then OpenCL would have used it instead of making their custom event object, and we could then easily construct a std::future and boost::future for Compute. Sadly, the AWG don't see this sort of consequence, or rather I suspect they don't hugely care.
You make the assumption that OpenCL merely exist on the host. They could just as well be containing device side specific information which is then be used directly on the device (no POSIX there). BTW, this is just one example where your assumption about kernel level synchronization is wrong. Another scenario is in coroutine like systems like HPX where you have different synchronization primitives (Boost.Fiber would be another example for that). And this is exactly where the challenge is: Trying to find a way to unify those different synchronization mechanisms. That way, we could have a unified future interface. The things you proposed so far can be a step in that direction but certainly don't include all necessary requirements.
Niall
I absolutely agree. "future islands" are a big problem which need a solution very soon. To some extent the shared state as described in the standard could be the interface to be used by the different islands. What we miss here is a properly defined interface etc.. I probably didn't make that clear enough in my initial mail, but i think this unifying future interface should be the way forward so that different domains can use this to implement their islands. FWIW, we already have that in HPX and we are currently integrating OpenCL events within our "future island", this works exceptionally well.
I personally think that any notion of any shared state in futures is one of the big design mistakes. Instead of "future as a shared_ptr", think "future as a pipe".
A future is no 'pipe', it's just the receiving end of a pipe which can be used once.
I missed that. Can you link the source/documentation/proposal once more please?
Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022. The key insight of that proposal is the notion of static composition of continuations as the core design. One then composes, at compile-time, a sequence of continuations which implement any combination and variety of future you like, including the STL ones and the proposed Concurrency TS ones. You will note how the functional static continuations are effectively monadic, and therefore these elementary future promises are actually a library based awaitable resumable monadic toolkit which could be used to write coroutine based Hana or Expected monadic sequences which can be arbitrarily paused, resumed, or transported across threads.
The power of the proposed model lies in dynamic composition of asynchronous operations, not static composition. Do I misunderstand something? Static composition could help for things like NT2, though.
Universal composure of any kind of future with any other kind is possible when they share the same underlying kernel wait object. I intend to use my proposed pthreads permit object which is a portable userspace pthreads event object as that universal kernel wait object. If widely adopted, it may persuade the AWG to admit permit objects into POSIX threads for standardisation, that way C and C++ code can all use interoperable wait composure.
That's exactly the issue! You will not be able to make all synchronization use the same kernel objects. HPX uses its own non-kernel objects for that, for instance. Using kernel objects makes things slow. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
They are also severely limited and limiting:
It does not help to call them limited or limiting. Let's extend them if this is the case.
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
The easiest way to deal with this is to introduce a Future concept and implement everything in terms of it. A solid set of traits/concepts-lite should cover that. A good example is the proposed await 2.0 (see N4134: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4134.pdf), btw. The proposed await keyword can be adapted to handle arbitrary future types using a user supplied trait. We tried that with hpx::future and it seems to work fine (once they fix a compiler bug which prevented us from doing large scale tests).
2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is.
As long as the overheads of managing the future itself are much smaller than the overheads introduced by the underlying threading system we're fine. And for std::future and boost::future this is definitely the case as both are tied to kernel-threads. In HPX this is a bigger problem as the overheads introduced by futures are comparable with those of the underlying threading system (sub-microsecond). However in our experience this is solvable (things like special allocators for the shared state and using intrusive_ptr for it come to mind).
3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow.
The implementations will throw only if there is an error. This is a no-issue for the non-exceptional case. And I personally don't care if the exceptional case is slow (involves things like logging anyways, etc.).
4. The compiler's optimiser really struggles to do much with the current future design because of all the implicit visibility to other threads. Even a very simple use of future requires hundreds of CPU instructions to be generated as a minimum, none of which can be elided because the compiler can't know visibility effects to other threads. I'll grant you that a HPX type design makes this problem much more tractable because the real problem here is the potential presence of hardware concurrency.
This is why Chris has proposed async_result from ASIO instead, that lets the caller of an async API supply the synchronisation method to be used for that particular call. async_result is superior to futures in all but one extremely important way: async_result cannot traverse an ABI boundary, while futures can.
Also, as Thomas stated in a different mail, async_result and futures are orthogonal. While futures represent the result of the synchronous operation itself and are just one possible way of delivering it back to the user, async_result is a means for the user to decide how he/she would like the asynchronous result be delivered. So futures and async_result are not mutually exclusive, what's the issue?
What do you mean by 'making everything a future'? Having all functions return futures? If so - then yes - if you want to make a function asynchronously callable, let it return a future. There is nothing wrong with that (well, except that std::future is utterly bulky and slow as it is usually tied to std::sthread which in turn is usually representing kernel threads - for a proposed solution see my talk at MeetingC++ 2014 [2]).
For the record, I'd just love if there were more HPX type thinking in how C++ concurrency is standardised.
However, I have learned with age and experience that people don't care much for whole new ways of thinking and approaching problems. They prefer some small incremental library which can be tacked onto their existing code without much conceptual change. To that end, when facing the limitations of std::future they can see the cost-benefit of boost:future, and can conceptualise replacing std::future with boost::future in their code. So that is a viable mental step for them.
Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off.
What if the whole std library was based on something like HPX? In this case the user wouldn't have to care about this anymore, right?
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
Non-allocating futures are a step in the right direction. But even those require to solve some of the problems you mentioned. Otherwise they will make the issue of having future-islands just a bit bigger... Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
On 4 Jan 2015 at 9:52, Hartmut Kaiser wrote:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
The easiest way to deal with this is to introduce a Future concept and implement everything in terms of it. A solid set of traits/concepts-lite should cover that.
I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not. For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at.
A good example is the proposed await 2.0 (see N4134: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4134.pdf), btw. The proposed await keyword can be adapted to handle arbitrary future types using a user supplied trait. We tried that with hpx::future and it seems to work fine (once they fix a compiler bug which prevented us from doing large scale tests).
N4134 presents excellent progress, apart from one major thing I disliked about it: I deeply dislike magic compiler tricks which fold the allocation of shared state by a future promise onto the stack. Far, far better to fix the present future promise API to stop requiring shared state and therefore memory allocation at all.
2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is.
As long as the overheads of managing the future itself are much smaller than the overheads introduced by the underlying threading system we're fine. And for std::future and boost::future this is definitely the case as both are tied to kernel-threads. In HPX this is a bigger problem as the overheads introduced by futures are comparable with those of the underlying threading system (sub-microsecond). However in our experience this is solvable (things like special allocators for the shared state and using intrusive_ptr for it come to mind).
I think a one size fits all future is a fundamentally flawed approach.
3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow.
The implementations will throw only if there is an error. This is a no-issue for the non-exceptional case. And I personally don't care if the exceptional case is slow (involves things like logging anyways, etc.).
That's not the problem. The problem is that the compiler cannot know
if no exception will ever be generated and therefore has to generate
the opcodes anyway. What I'm really asking for is a "noexcept future"
such that this sequence:
promise<int> p;
auto f(p.get_future());
p.set_value(5);
return f.get();
... can be optimised by the compiler into:
_Z5test1v: # @_Z5test1v
.cfi_startproc
# BB#0: # %_ZN7promiseIiJEED2Ev.exit2
movl $5, %eax
ret
Obviously this is an unrealistic use case, but my point is that the
compiler should be capable of such a reduction because the correct
design of future promise wouldn't get in the way.
My earlier proposal for non-allocating future promise doesn't even
transport a value it's so basic (hence "basic_future" and
"basic_promise"). It pushes that responsibility onto something like
expected
Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off.
What if the whole std library was based on something like HPX? In this case the user wouldn't have to care about this anymore, right?
That works for me. I just don't want yet another STL implementation to support.
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
Non-allocating futures are a step in the right direction. But even those require to solve some of the problems you mentioned. Otherwise they will make the issue of having future-islands just a bit bigger...
Eliminating future islands is, I suspect, not something the C++ community can entirely do alone. We are, as a minimum, going to have to petition POSIX for improved runtime support. We probably ought to have our ducks in a row before that though. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Monday, January 05, 2015 11:01:53 Niall Douglas wrote:
On 4 Jan 2015 at 9:52, Hartmut Kaiser wrote:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
The easiest way to deal with this is to introduce a Future concept and implement everything in terms of it. A solid set of traits/concepts-lite should cover that.
I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not.
You make the assumption that you only ever synchronize on kernel space objects. This is not at all required nor necessary.
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at.
In which scenario do you have both a Windows HANDLE and a Linux futex? <snip>
Replacing the entire concurrency engine and indeed paradigm in your C++ runtime is, I suspect, too scary for most, even if the code changes are straightforward. It'll be the "bigness" of the concept which scares them off.
What if the whole std library was based on something like HPX? In this case the user wouldn't have to care about this anymore, right?
That works for me. I just don't want yet another STL implementation to support.
Luckily, you don't have to. In the case of HPX, we'd only need to replace a particular subsystem of say libc++.
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
Non-allocating futures are a step in the right direction. But even those require to solve some of the problems you mentioned. Otherwise they will make the issue of having future-islands just a bit bigger...
Eliminating future islands is, I suspect, not something the C++ community can entirely do alone. We are, as a minimum, going to have to petition POSIX for improved runtime support. We probably ought to have our ducks in a row before that though.
Again the assumption that you need kernel based synchronization which does not hold.
Niall
On 5 Jan 2015 at 12:49, Thomas Heller wrote:
I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not.
You make the assumption that you only ever synchronize on kernel space objects. This is not at all required nor necessary.
I make the assumption that one _eventually_ synchronises on kernel wait objects, and I also assume that you usually need the ability to fall back onto a kernel wait in most potential wait scenarios (e.g. if no coroutine work is pending, and there is nothing better to do but sleep now). One could I suppose simply call yield() all the time, but that is battery murder for portable devices. What is missing on POSIX is a portable universal kernel wait object used by everything in the system. It is correct to claim you can easily roll your own with a condition variable and an atomic, the problem comes in when one library (e.g. OpenCL) has one kernel wait object and another library has a slightly different one, and the two cannot be readily composed into a single wait_for_all() or wait_for_any() which accepts all wait object types, including non-kernel wait object types. Windows does have such a universal kernel wait object (the event object). And on POSIX you could inefficiently emulate a universal kernel wait object using a pipe at the cost of two file descriptors per object, though directly using a futex on Linux would be cheaper. On 5 Jan 2015 at 13:49, Thomas Heller wrote:
No it isn't. Current futures require the compiler to generate the code for handling exception throws irrespective of whether it could ever happen or not. As a relative weight to something like a SHA round which is fundamentally noexcept, this isn't a trivial overhead especially when it's completely unnecessary.
Ok. Hands down: What's the associated overhead you are talking about? Do you have exact numbers?
I gave you exact numbers: a 13% overhead for a SHA256 round.
The problem with async_result (as mentioned in a different post) is that it merely takes care of "transporting" from the ASIO future island to another one. It can be just as well be adapted to any other future based system.
Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022. The key insight of that proposal is the notion of static composition of continuations as the core design. One then composes, at compile-time, a sequence of continuations which implement any combination and variety of future you like, including the STL ones and the
Absolutely. Which is precisely why it's a very viable alternative to fiddling with futures. Most programmers couldn't give a toss about whether futures do this or that, they do care when they have to jump through hoops because library A is in a different future island to library B. Chris' async_result approach makes that go away right now, not in 2019 or later. It's a very valid riposte to the Concurrency TS, and unlike the Concurrency TS his approach is portable and is already standard practice instead of invention of standards by mostly Microsoft. proposed
Concurrency TS ones. You will note how the functional static continuations are effectively monadic, and therefore these elementary future promises are actually a library based awaitable resumable monadic toolkit which could be used to write coroutine based Hana or Expected monadic sequences which can be arbitrarily paused, resumed, or transported across threads.
This looks indeed promising. I think we should further investigate how this could be used when dealing with truly asynchronous and concurrently executed tasks.
Universal composure of any kind of future with any other kind is possible when they share the same underlying kernel wait object. I intend to use my proposed pthreads permit object which is a
For me it's a question of free time. This is stuff I do for only a few hours per week because this time is unfunded (happy to discount my hourly rate for anyone wanting to speed these up!), and right now my priority queue is: 1. Release BindLib based AFIO to stable branch (ETA: end of January). 2. Get BindLib up to Boost quality, and submit for Boost review (ETA: March/April). 3. C++ Now 2015 presentation (May). 4a. Non-allocating lightweight future promises extending Expected (from June onwards). 4b. Google Summer of Code mentoring of concurrent_unordered_map so it can be finished and submitted into Boost. That's the best I can do given this is unfunded time. portable
userspace pthreads event object as that universal kernel wait object. If widely adopted, it may persuade the AWG to admit permit objects into POSIX threads for standardisation, that way C and C++ code can all use interoperable wait composure.
Indeed, if POSIX threads already had the permit object, then OpenCL would have used it instead of making their custom event object, and we could then easily construct a std::future and boost::future for Compute. Sadly, the AWG don't see this sort of consequence, or rather I suspect they don't hugely care.
You make the assumption that OpenCL merely exist on the host.
No, it's more I'm limiting the discussion to host-only and indeed kernel threading only. I might add that I took care in my pthreads permit object design that it works as expected without a kernel being present so it can be used during machine bootstrap, indeed you can create a pthreads permit object which only spins and yields. That object design is entirely capable of working correctly under coroutines too, or on a GPU. It's a C API abstraction of some ability for one strand to signal another strand, how that is actually implemented underneath is a separate matter.
They could just as well be containing device side specific information which is then be used directly on the device (no POSIX there). BTW, this is just one example where your assumption about kernel level synchronization is wrong. Another scenario is in coroutine like systems like HPX where you have different synchronization primitives (Boost.Fiber would be another example for that). And this is exactly where the challenge is: Trying to find a way to unify those different synchronization mechanisms. That way, we could have a unified future interface. The things you proposed so far can be a step in that direction but certainly don't include all necessary requirements.
Actually this is the exact basis for my argument regarding many future types, and creating a library which is a factory for future types. In C++ in a proper design we only pay for what we use, so a future suitable for a SHA round needs to be exceptionally lightweight, and probably can't copy-compose at all but can move-compose (this is where a newly created future can atomically destroy its immediately preceding future, and therefore a wait_for_all() on an array of such lightweight futures works as expected). Meanwhile a future which can be used across processes concurrently would be necessarily a far heavier and larger object. The same applies to coroutine parallelism, or HPX, or WinRT. They all get families of future types best suited for the task at hand, and if the programmer needs bridges across future islands then they pay for such a facility. The cost is, as Harmut says, a multiplication of future islands, but I believe that is inevitable anyway, so one might as well do it right from the beginning. I might add that BindLib lets the library end user choose what kind of future the external API of the library uses. Indeed BindLib based AFIO lets you choose between std::future and boost::future, and moreover you can use both configurations of AFIO in the same translation unit and it "just works". I could very easily - almost trivially - add support for a hpx::future in there, though AFIO by design needs kernel threads because it's the only way of generating parallelism in non-microkernel operating system kernels (indeed, the whole point of AFIO is to abstract that detail away for end users). This is why I'd like to ship BindLib sooner rather than later. I believe it could represent a potential enormous leap forward for the quality and usability of C++ 11 requiring Boost libraries. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Tuesday, January 06, 2015 09:13:57 Niall Douglas wrote:
On 5 Jan 2015 at 12:49, Thomas Heller wrote:
I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not.
You make the assumption that you only ever synchronize on kernel space objects. This is not at all required nor necessary.
I make the assumption that one _eventually_ synchronises on kernel wait objects, and I also assume that you usually need the ability to fall back onto a kernel wait in most potential wait scenarios (e.g. if no coroutine work is pending, and there is nothing better to do but sleep now). One could I suppose simply call yield() all the time, but that is battery murder for portable devices.
That's, IMHO, a implementation detail of one specific future island, or to be more precise of the way the tasks are scheduled. This has nothing to do how you suspend a user level task, for example.
What is missing on POSIX is a portable universal kernel wait object used by everything in the system. It is correct to claim you can easily roll your own with a condition variable and an atomic, the problem comes in when one library (e.g. OpenCL) has one kernel wait object and another library has a slightly different one, and the two cannot be readily composed into a single wait_for_all() or wait_for_any() which accepts all wait object types, including non-kernel wait object types.
Exactly, this could be easily achieved by defining an appropriate API for the shared state of asynchronous operations, the wait functions would then just use the async result objects, which in turn use to wait the functionality as implemented in the shared state. A portable, universal kernel wait object is not really necessary for that. Not everyone wants to pay for the cost of a kernel transition. This is an implementation detail of a specific future island, IMHO. Aside from that, i don't want to limit myself to POSIX. <snip>
On 5 Jan 2015 at 13:49, Thomas Heller wrote:
No it isn't. Current futures require the compiler to generate the code for handling exception throws irrespective of whether it could ever happen or not. As a relative weight to something like a SHA round which is fundamentally noexcept, this isn't a trivial overhead especially when it's completely unnecessary.
Ok. Hands down: What's the associated overhead you are talking about? Do you have exact numbers?
I gave you exact numbers: a 13% overhead for a SHA256 round.
To quote your earlier mail: "The best I could get it to is 17 cycles a byte, with the scheduling (mostly future setup and teardown) consuming 2 cycles a byte, or a 13% overhead which I feel is unacceptable." So which of these "mostly future setup and teardown" is related to exception handling? Please read http://www.open-std.org/Jtc1/sc22/wg21/docs/TR18015.pdf from page 32 onwards. I was under the impression that we left the "exceptions are slow" discussion way behind us :/ <snip>
1. Release BindLib based AFIO to stable branch (ETA: end of January). 2. Get BindLib up to Boost quality, and submit for Boost review (ETA: March/April). Just a minor very unrelated remark. I find the name "BindLib" very confusing.
<snip>
I might add that BindLib lets the library end user choose what kind of future the external API of the library uses. Indeed BindLib based AFIO lets you choose between std::future and boost::future, and moreover you can use both configurations of AFIO in the same translation unit and it "just works". I could very easily - almost trivially - add support for a hpx::future in there, though AFIO by design needs kernel threads because it's the only way of generating parallelism in non-microkernel operating system kernels (indeed, the whole point of AFIO is to abstract that detail away for end users).
*shiver* I wouldn't want to maintain such a library. This sounds very dangerous and limiting. Note that both boost::future and hpx::future are far more capable than the current std::future with different performance characteristics.
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
What is missing on POSIX is a portable universal kernel wait object used by everything in the system. It is correct to claim you can easily roll your own with a condition variable and an atomic, the problem comes in when one library (e.g. OpenCL) has one kernel wait object and another library has a slightly different one, and the two cannot be readily composed into a single wait_for_all() or wait_for_any() which accepts all wait object types, including non-kernel wait object types.
Exactly, this could be easily achieved by defining an appropriate API for the shared state of asynchronous operations, the wait functions would then just use the async result objects, which in turn use to wait the functionality as implemented in the shared state.
You still seem to be assuming the existence of a shared state in wait objects :( I suppose it depends on how you define a shared state, but for that non-allocating design of mine the (a better name) "notification target" is the promise if get_future() has never been called, and the future if get_future() has ever been called. The notification target is kept by an atomic pointer, if he is set he points at a future somewhere, if he is null then either the promise is broken or the target is the promise.
A portable, universal kernel wait object is not really necessary for that.
I think a portable, universal C API kernel wait object is very necessary if C++ is to style itself as a first tier systems programming language. We keep trivialising C compatibility, and we should not.
Not everyone wants to pay for the cost of a kernel transition.
You appear to assume a kernel transition is required. My POSIX permit object can CAS lock spin up to a certain limit before even considering to go acquire a kernel wait object at all, which I might add preferentially comes from a user side recycle list where possible. So if the wait period is very short, no kernel transition is required, indeed you don't even call malloc. That said, its design is highly limited to doing what it does because it has to make hard coded conservative assumptions about its surrounding environment. It can't support coroutines for example, and the fairness implementation does make it quite slow compared to a CAS lock because it can't know if fairness is important or not, so it must assume it is. Still, this is a price you need to pay if you want a C API which cannot take template specialisations.
This is an implementation detail of a specific future island, IMHO. Aside from that, i don't want to limit myself to POSIX.
My POSIX permit object also works perfectly on Windows using the Windows condition variable API. And on Boost.Thread incidentally, I patch in the Boost.Thread condition_variable implementation. That gains me the thread cancellation emulation support in Boost.Thread and makes the boost::permit<> class fairly trivial to implement.
Ok. Hands down: What's the associated overhead you are talking about? Do you have exact numbers?
I gave you exact numbers: a 13% overhead for a SHA256 round.
To quote your earlier mail: "The best I could get it to is 17 cycles a byte, with the scheduling (mostly future setup and teardown) consuming 2 cycles a byte, or a 13% overhead which I feel is unacceptable."
So which of these "mostly future setup and teardown" is related to exception handling? Please read http://www.open-std.org/Jtc1/sc22/wg21/docs/TR18015.pdf from page 32 onwards. I was under the impression that we left the "exceptions are slow" discussion way behind us :/
I didn't claim that. I claimed that the compiler can't optimise out the generation of exception handling boilerplate in the present design of futures, and I personally find that unfortunate. The CPU will end up skipping over most of the generated opcodes, and without much overhead if it has a branch predictor, but it is still an unfortunate outcome when futures could be capable of noexcept. Then the compiler could generate just a few opcodes in an ideal case when compiling a use of a future. With regard to the 13% overhead above, almost all of that overhead was the mandatory malloc/free cycle in present future implementations.
<snip>
1. Release BindLib based AFIO to stable branch (ETA: end of January). 2. Get BindLib up to Boost quality, and submit for Boost review (ETA: March/April). Just a minor very unrelated remark. I find the name "BindLib" very confusing.
The library is a toolkit for locally binding libraries into namespaces :). It means that library A can be strongly bound to vX of library B, while library C can be strongly bound to vY of library B, all in the same translation unit. This was hard to do in C++ until C++ 11, and it's still a non-trivial effort though BindLib takes away a lot of the manual labour.
I might add that BindLib lets the library end user choose what kind of future the external API of the library uses. Indeed BindLib based AFIO lets you choose between std::future and boost::future, and moreover you can use both configurations of AFIO in the same translation unit and it "just works". I could very easily - almost trivially - add support for a hpx::future in there, though AFIO by design needs kernel threads because it's the only way of generating parallelism in non-microkernel operating system kernels (indeed, the whole point of AFIO is to abstract that detail away for end users).
*shiver* I wouldn't want to maintain such a library. This sounds very dangerous and limiting. Note that both boost::future and hpx::future are far more capable than the current std::future with different performance characteristics.
A lot of people expressed that opinion before I started BindLib - they said the result would be unmaintainable, unstable and by implication, the whole idea was unwise. I thought they were wrong, and now I know they are wrong. Future implementations, indeed entire threading implementations, are quite substitutable for one another when they share a common API, and can even coexist in the same translation unit surprisingly well. One of the unit tests for AFIO compiles a monster executable consisting of five separate builds of the full test suite each with a differing threading, filesystem and networking library, all compiled in a single repeatedly reincluded all header translation unit. It takes some minutes for the compiler to generate a binary. The unit tests, effectively looped five times but with totally different underlying library dependency implementations, all pass all green. You might think it took me a herculean effort to implement that. It actually took me about fifteen hours. People overestimate how substitutable STL threading implementations are, if your code already can accept any of Dinkumware vs SGI vs Apple STL implementations, it's a very small additional step past that. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 12 January 2015 15:30, Niall Douglas wrote:
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
Just a minor very unrelated remark. I find the name "BindLib" very confusing.
The library is a toolkit for locally binding libraries into namespaces :).
I've not come across the word 'Bind' used this way in C++, and of course it is used for other things. Would 'Alias' or 'NamespaceAlias' work? Best regards, Gareth ************************************************************************ The information contained in this message or any of its attachments may be confidential and is intended for the exclusive use of the addressee(s). Any disclosure, reproduction, distribution or other dissemination or use of this communication is strictly prohibited without the express permission of the sender. The views expressed in this email are those of the individual and not necessarily those of Sony or Sony affiliated companies. Sony email is for business use only. This email and any response may be monitored by Sony to be in compliance with Sony's global policies and standards
On 12 Jan 2015 at 16:19, Sylvester-Bradley, Gareth wrote:
On 12 January 2015 15:30, Niall Douglas wrote:
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
Just a minor very unrelated remark. I find the name "BindLib" very confusing.
The library is a toolkit for locally binding libraries into namespaces :).
I've not come across the word 'Bind' used this way in C++, and of course it is used for other things.
Would 'Alias' or 'NamespaceAlias' work?
Those unfortunately have even more confusing alternative meanings in C++. AliasLib makes me think pointer aliasing. NamespaceAliasLib is too long I think, and besides you're not aliasing namespaces, you're aliasing types. I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of. Do bear in mind it doesn't just mount libraries into namespaces, it also provides an emulation of Boost.Test using CATCH, an emulation of Boost.Config using C++ 17 feature detect, and some preprocessor metaprogramming to have the compiler auto generate correct bind namespaces on inline and non-inline namespace compilers (i.e. VS2013). Some other names I have thought of are TypedefLib, LocalCppBindLib, another was BoostStandaloneLib and ModularBoostLib. All are obtuse or too long. Any better names gratefully received. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On January 12, 2015 5:32:23 PM EST, Niall Douglas
On 12 Jan 2015 at 16:19, Sylvester-Bradley, Gareth wrote:
On 12 January 2015 15:30, Niall Douglas wrote:
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
Just a minor very unrelated remark. I find the name "BindLib" very confusing.
The library is a toolkit for locally binding libraries into namespaces :).
I've not come across the word 'Bind' used this way in C++, and of course it is used for other things.
Would 'Alias' or 'NamespaceAlias' work?
Those unfortunately have even more confusing alternative meanings in C++.
I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of.
Any better names gratefully received.
Name[space]Mapping ___ Rob (Sent from my portable computation engine)
On 12 Jan 2015 at 20:24, Rob Stewart wrote:
I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of.
Any better names gratefully received.
Name[space]Mapping
NameMappingLib or just plain NameMapping? How about APIMapping? BindLib also provides API version management via preprocessor metaprogramming. Would even APIBind be better? Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Tue, Jan 13, 2015 at 8:38 AM, Niall Douglas
On 12 Jan 2015 at 20:24, Rob Stewart wrote:
I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of.
Any better names gratefully received.
Name[space]Mapping
NameMappingLib or just plain NameMapping?
How about APIMapping? BindLib also provides API version management via preprocessor metaprogramming.
Would even APIBind be better?
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
I like APIBind. Any of the above are good. Anything from { API, Name, Lib } X { Bind, Mapping } Boost.Using is too generic. Tony
Hi Niall, On 12 January 2015 22:32, Niall Douglas wrote:
On 12 Jan 2015 at 16:19, Sylvester-Bradley, Gareth wrote:
On 12 January 2015 15:30, Niall Douglas wrote:
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
Just a minor very unrelated remark. I find the name "BindLib" very confusing.
The library is a toolkit for locally binding libraries into namespaces :).
I've not come across the word 'Bind' used this way in C++, and of course it is used for other things.
Would 'Alias' or 'NamespaceAlias' work?
Those unfortunately have even more confusing alternative meanings in C++. AliasLib makes me think pointer aliasing. NamespaceAliasLib is too long I think, and besides you're not aliasing namespaces, you're aliasing types.
FWIW, the "Lib" suffix doesn't follow the usual naming convention for Boost libraries [http://www.boost.org/development/requirements.html] which would make it Boost.NamespaceAlias, the Boost NamespaceAlias library, or NamespaceAlias, a Boost library by Niall Douglas. However, I can see the point about the meanings of both those suggestions.
I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of.
Yes, naming is the hardest problem. :-) Not sure what you'll think of this, but how about Boost.Using? Or Boost.NameAlias?
Do bear in mind it doesn't just mount libraries into namespaces, it also provides an emulation of Boost.Test using CATCH,
That's interesting, since I've just had to do that, or rather a shim that can be switched from one to the other, myself. I'll have a look at your repo; interested to see how you solved the BOOST_AUTO _TEST_CASE_TEMPLATE mapping to Catch.
...
Best regards, Gareth ************************************************************************ The information contained in this message or any of its attachments may be confidential and is intended for the exclusive use of the addressee(s). Any disclosure, reproduction, distribution or other dissemination or use of this communication is strictly prohibited without the express permission of the sender. The views expressed in this email are those of the individual and not necessarily those of Sony or Sony affiliated companies. Sony email is for business use only. This email and any response may be monitored by Sony to be in compliance with Sony's global policies and standards
On 13 January 2015 09:02, Gareth Sylvester-Bradley wrote:
FWIW, the "Lib" suffix doesn't follow the usual naming convention for Boost libraries [http://www.boost.org/development/requirements.html] which would make it Boost.NamespaceAlias, the Boost NamespaceAlias library, or NamespaceAlias, a Boost library by Niall Douglas.
I've just read this back and realise BindLib is intended to convey that it's for "binding libraries", rather than being a "binding" library... but maybe I'm not the only one who didn't get that immediately. Moving Lib(rary) to the front might help... Boost.LibAlias? Gareth ************************************************************************ The information contained in this message or any of its attachments may be confidential and is intended for the exclusive use of the addressee(s). Any disclosure, reproduction, distribution or other dissemination or use of this communication is strictly prohibited without the express permission of the sender. The views expressed in this email are those of the individual and not necessarily those of Sony or Sony affiliated companies. Sony email is for business use only. This email and any response may be monitored by Sony to be in compliance with Sony's global policies and standards
On 13 Jan 2015 at 9:01, Sylvester-Bradley, Gareth wrote:
Those unfortunately have even more confusing alternative meanings in C++. AliasLib makes me think pointer aliasing. NamespaceAliasLib is too long I think, and besides you're not aliasing namespaces, you're aliasing types.
FWIW, the "Lib" suffix doesn't follow the usual naming convention for Boost libraries [http://www.boost.org/development/requirements.html] which would make it Boost.NamespaceAlias, the Boost NamespaceAlias library, or NamespaceAlias, a Boost library by Niall Douglas.
However, I can see the point about the meanings of both those suggestions.
Thing is, it is a library for mounting other libraries into a yet another library's local namespace. I figured the Lib ought to be in there somewhere.
I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of.
Yes, naming is the hardest problem. :-) Not sure what you'll think of this, but how about Boost.Using? Or Boost.NameAlias?
Boost.Using ... Yes, I think I like that a lot. Thank you.
Do bear in mind it doesn't just mount libraries into namespaces, it also provides an emulation of Boost.Test using CATCH,
That's interesting, since I've just had to do that, or rather a shim that can be switched from one to the other, myself.
Yep, exactly my need too. A modular Boost library not requiring Boost needs a substitutable unit testing framework.
I'll have a look at your repo; interested to see how you solved the BOOST_AUTO _TEST_CASE_TEMPLATE mapping to Catch.
Easy: I didn't. AFIO and Spinlock, like probably most Boost.Test users, only use maybe 5% of Boost.Test's capabilities. Essentially auto test casing, requires and checks. So that's what BindLib, soon to become Using, emulates. For most users, Boost.Test is way overkill, it is probably why people get so frustrated when small bugs remain unfixed in release builds for so long. Essentially they don't care about most of Boost.Test or the bigger picture for the library, so small unfixed problems really bug them. For me personally, AFIO simply undef's Boost.Test internal macros and redefs them to working fixed versions. I don't see the issues others do with that, but then my test framework isn't tied deeply into Boost.Test.
I've just read this back and realise BindLib is intended to convey that it's for "binding libraries", rather than being a "binding" library... but maybe I'm not the only one who didn't get that immediately.
People don't get the concept at all yet of mapping APIs from one location into many others. It is very new to C++ still of course. I think the biggest technical problem preventing passing community review is the state of the libclang based bindings generator. It needs a whole load more work on it, but my problem is that it is already good enough that the minor errors in its output are easily hand repaired :( Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 14/01/2015 03:13, Niall Douglas wrote:
On 13 Jan 2015 at 9:01, Sylvester-Bradley, Gareth wrote:
I do agree about BindLib being a poor name. It is actually onto its third choice of name now :( but to date, it's the least terrible of the reasonably descriptive library names I have thought of.
Yes, naming is the hardest problem. :-) Not sure what you'll think of this, but how about Boost.Using? Or Boost.NameAlias?
Boost.Using ...
Yes, I think I like that a lot. Thank you.
I'm not sure that's a good choice either, for all that it sounds cool. Namespace boost::using would be confusing if it weren't illegal. "Using Boost.Using" in documentation would be confusing. (Even more so if "Boost" were omitted.) I'm sure some people would think, on seeing it in an index or table of contents somewhere, that it referred to general usage instructions for Boost as a whole, rather than the name of a particular library.
On Mon, Jan 12, 2015 at 3:30 PM, Niall Douglas
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
[snip]
Exactly, this could be easily achieved by defining an appropriate API for the shared state of asynchronous operations, the wait functions would then just use the async result objects, which in turn use to wait the functionality as implemented in the shared state.
You still seem to be assuming the existence of a shared state in wait objects :(
I suppose it depends on how you define a shared state, but for that non-allocating design of mine the (a better name) "notification target" is the promise if get_future() has never been called, and the future if get_future() has ever been called. The notification target is kept by an atomic pointer, if he is set he points at a future somewhere, if he is null then either the promise is broken or the target is the promise.
Hi Neal, I have been following the thread with interest, and I wanted to know more about your non-allocating future/promise pair. As far as I understand, your future and promise have a pointer to each other and they update the other side every time they are moved, right? My question is, as you need to do the remote update with an atomic operation (exchange in the best case), and you usually perform at least a few moves (when composing futures for example), wouldn't a fast allocator outperform this solution?
A portable, universal kernel wait object is not really necessary for that.
I think a portable, universal C API kernel wait object is very necessary if C++ is to style itself as a first tier systems programming language.
For what is worth I'm working on a proof-of-concept future/promise pair that is wait strategy agnostic. The only function that needs to know about the wait strategy are the future::wait{,_for,_untill,_any,_all} family and of course future::get, in case it needs to call wait. In fact the wait functions are parametrized on the wait strategy (be it a futex, condition variable, posix fd, posix semaphore, coroutine yield, etc) and the wait object can be stack allocated. If I get everything right, all other functions, in particular promise::set_value and future::then should be lock-free (or wait free, depending on the underlying hardware). The shared state should also have a nice minimal API. The idea is fairly obvious in retrospect, I hope to be able to share some code soon. -- gpd
On 12 Jan 2015 at 21:50, Giovanni Piero Deretta wrote:
I have been following the thread with interest, and I wanted to know more about your non-allocating future/promise pair. As far as I understand, your future and promise have a pointer to each other and they update the other side every time they are moved, right?
Exactly right.
My question is, as you need to do the remote update with an atomic operation (exchange in the best case), and you usually perform at least a few moves (when composing futures for example), wouldn't a fast allocator outperform this solution?
Firstly, I found a separate CAS lock each per future and promise is considerably faster than trying to be any more clever. When updating, you lock both objects with back off before the update. Secondly, no this approach is far faster than a fast allocator, at least on Intel. The reason why is because promises and futures are very, very rarely contended on the same cache line between threads, so the CAS locking and updating almost never spins or contends. It's pretty much full speed ahead. The problem with specialised allocators is that firstly Boost.Thread's futures don't support allocators with futures, and secondly even if they did as soon as you bring global memory effects into the picture, you constrain the compiler optimiser considerably. For example, make_ready_future() with the test code I wrote is implemented very naively as: promise<T> p; future<T> f(p.get_future()); p.set_value(v); return f; ... make_ready_future(5).get(); ... which the compiler collapses into movl 5, eax ret Any use of an allocator can't let the compiler do that for you because touching global memory means the compiler has to assume an unknown read. This doesn't mean a custom make_ready_future() couldn't produce an equally optimised outcome, but for me personally the ability of the compiler to collapse opcode output suggests a good design here. I would also assume that when allowed to collapse opcodes, the compiler can also do alias folding etc which the use of an allocator may inhibit.
A portable, universal kernel wait object is not really necessary for that.
I think a portable, universal C API kernel wait object is very necessary if C++ is to style itself as a first tier systems programming language.
For what is worth I'm working on a proof-of-concept future/promise pair that is wait strategy agnostic. The only function that needs to know about the wait strategy are the future::wait{,_for,_untill,_any,_all} family and of course future::get, in case it needs to call wait. In fact the wait functions are parametrized on the wait strategy (be it a futex, condition variable, posix fd, posix semaphore, coroutine yield, etc) and the wait object can be stack allocated.
If I get everything right, all other functions, in particular promise::set_value and future::then should be lock-free (or wait free, depending on the underlying hardware).
The shared state should also have a nice minimal API.
The idea is fairly obvious in retrospect, I hope to be able to share some code soon.
I look forward to seeing some test code! Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Mon, Jan 12, 2015 at 4:50 PM, Giovanni Piero Deretta
Hi Neal,
I have been following the thread with interest, and I wanted to know more about your non-allocating future/promise pair. As far as I understand, your future and promise have a pointer to each other and they update the other side every time they are moved, right?
That's how I did it. I think Niall did similar. A fairly complete (if untested) implementation fits on a single slide: slide 154 of https://github.com/boostcon/cppnow_presentations_2013/blob/master/mon/future... from C++Now 2013 https://www.youtube.com/watch?v=QkkyiRtmQ5M Tony
On 01/12/2015 04:30 PM, Niall Douglas wrote:
On 7 Jan 2015 at 12:40, Thomas Heller wrote:
What is missing on POSIX is a portable universal kernel wait object used by everything in the system. It is correct to claim you can easily roll your own with a condition variable and an atomic, the problem comes in when one library (e.g. OpenCL) has one kernel wait object and another library has a slightly different one, and the two cannot be readily composed into a single wait_for_all() or wait_for_any() which accepts all wait object types, including non-kernel wait object types.
Exactly, this could be easily achieved by defining an appropriate API for the shared state of asynchronous operations, the wait functions would then just use the async result objects, which in turn use to wait the functionality as implemented in the shared state.
You still seem to be assuming the existence of a shared state in wait objects :(
I absolutely do. Because I think it is inevitable. More on that below.
I suppose it depends on how you define a shared state, but for that non-allocating design of mine the (a better name) "notification target" is the promise if get_future() has never been called, and the future if get_future() has ever been called. The notification target is kept by an atomic pointer, if he is set he points at a future somewhere, if he is null then either the promise is broken or the target is the promise.
Well, I think we have quite some misunderstanding here. I was using the nomenclature as it is used in the standard. The standard talks about asynchronous return objects (future and shared_future) and asynchronous providers (promise, packaged_task, async). So yes, the future is a notification target, and a promise is just one means to notify the asynchronous return object (or notification target as you call it). The shared state is therefor the communication channel connecting those two. Non-allocating futures (and by implication non reference counted shared states) have a major problem. I am basing my observation on your proposal of basic_future and basic_promise (with my assumption to have the nomenclature as in the standard). The problem is dangling pointers: https://gist.github.com/sithhell/260796afcf11364eaf26 I can see that the problem 1 could get fixed easily by updating the pointer the promise is pointing to ... but what about the second problem?
A portable, universal kernel wait object is not really necessary for that.
I think a portable, universal C API kernel wait object is very necessary if C++ is to style itself as a first tier systems programming language.
We keep trivialising C compatibility, and we should not.
No one is trivialising C compatibility. You can call any C code from C++.
Not everyone wants to pay for the cost of a kernel transition.
You appear to assume a kernel transition is required.
My POSIX permit object can CAS lock spin up to a certain limit before even considering to go acquire a kernel wait object at all, which I might add preferentially comes from a user side recycle list where possible. So if the wait period is very short, no kernel transition is required, indeed you don't even call malloc.
That said, its design is highly limited to doing what it does because it has to make hard coded conservative assumptions about its surrounding environment. It can't support coroutines for example, and the fairness implementation does make it quite slow compared to a CAS lock because it can't know if fairness is important or not, so it must assume it is. Still, this is a price you need to pay if you want a C API which cannot take template specialisations.
So it is limited but generic? That doesn't make sense.
This is an implementation detail of a specific future island, IMHO. Aside from that, i don't want to limit myself to POSIX.
My POSIX permit object also works perfectly on Windows using the Windows condition variable API. And on Boost.Thread incidentally, I patch in the Boost.Thread condition_variable implementation. That gains me the thread cancellation emulation support in Boost.Thread and makes the boost::permit<> class fairly trivial to implement.
Please decide yourself what you want to call it ... is it POSIX now or platform indepedent? Is it C or C++? boost::permit<> looks pretty C++-ish to me. -- Thomas Heller Friedrich-Alexander-Universität Erlangen-Nürnberg Department Informatik - Lehrstuhl Rechnerarchitektur Martensstr. 3 91058 Erlangen Tel.: 09131/85-27018 Fax: 09131/85-27912 Email: thomas.heller@cs.fau.de
Let's see if I've captured everything: - we want to get a value from (likely) another THREAD, and WAIT if it is not yet ready UNTIL it is ready, (whatever THREAD WAIT UNTIL mean) - a future (boost:: or std:: or hpx:: etc) is a (high-level!) wrapping of that concept - to be inter-operable we need the lower level concepts (in ALL-CAPS) exposed THREAD: we actually don't need to completely define what "another" THREAD is, we only need a partial definition of our OWN THREAD. ie the *thread of execution* of the current code, because any code (that we assume at some point runs) runs in some "thread of execution" - whether user mode thread, kernel thread, HPX, whatever... What we really need is access to the _controller_ of that thread of execution - ie typically the scheduler, but not a full scheduler interface. In recent C++ proposals this has been called the "execution agent", although I've been pushing for EXECUTION CONTEXT - the C++ object/interface that controls the context of the execution of the current set of instructions. WAIT: The running code needs to be able to get at its EXECUTION CONTEXT in order to be able to ask it WAIT. For a given EXECUTION CONTEXT, WAIT means stop "running" - stop using CPU, etc. No further PROGRESS (which turns out to be a hard term to define, but the committee is working on that) UNTIL: Some *other* EXECUTION CONTEXT needs to be able to tell the aforementioned EXECUTION CONTEXT to RESUME. Note that I've left off the actual value inside the future, and the "waitable" object (kernel or otherwise). The value is orthogonal - the value is just a memory location (unless we abstract that away!) and can be set via atomic operations or whatever. Once the value is set, you then call RESUME on the execution agent. From a value point-of-view, what you want is just an ObservableValue. ie when value is changed, call this callable (function, lambda, whatever). Given an ObservableValue, and an ExecutionAgent you can build a promise. Tony
On 13 Jan 2015 at 12:57, Gottlob Frege wrote:
Let's see if I've captured everything:
Let me rewrite your rewrite ...
- we want to get a value from (likely) another THREAD, and WAIT if it is not yet ready UNTIL it is ready, (whatever THREAD WAIT UNTIL mean) - a future (boost:: or std:: or hpx:: etc) is a (high-level!) wrapping of that concept - to be inter-operable we need the lower level concepts (in ALL-CAPS) exposed
THREAD: we actually don't need to completely define what "another" THREAD is, we only need a partial definition of our OWN THREAD. ie the *thread of execution* of the current code, because any code (that we assume at some point runs) runs in some "thread of execution" - whether user mode thread, kernel thread, HPX, whatever...
If you replaced the word THREAD with EXECUTION CONTEXT, where that means any of the following: 1. Kernel thread (stackful, 1:1 mapping, probably with some real hardware concurrency, cooperative context switching preferred but if too long elapses you get preempted by a timer) 2. Process thread (stackful, M:N mapping onto kernel threads, cooperative context switching though the cooperative part may be done for you by a local process runtime e.g. Fiber, ASIO, HPX, WinRT) 3. Functional/monadic call sequence (stateful though stackless, because functional call sequence is always constexpr and must only propagate state changes forwards, it can be safely suspended at any time without issue e.g. Hana, Expected, future-promise incidentally) 4. Stackless coroutine (an elaboration of item 5, but you get to write your code in a slightly less fragmented way which may aid maintainability e.g. Coroutine, ASIO's duff device macros implementing these) 5. Event callback handler (stackless, state is usually passed via a single void * or equivalent (std::function), almost always a local process runtime calls you when something happens e.g. NT Kernel APCs, POSIX signals, POSIX AIO, ASIO) This may explain why I am so keen that the compiler can optimise a future-promise down to single digit opcodes. For EXECUTION CONTEXT 3 and 5 the compiler's optimiser is fully and partially available, and can collapse whole sections of code.
What we really need is access to the _controller_ of that thread of execution - ie typically the scheduler, but not a full scheduler interface. In recent C++ proposals this has been called the "execution agent", although I've been pushing for EXECUTION CONTEXT - the C++ object/interface that controls the context of the execution of the current set of instructions.
This is what the Executors concept was supposed to provide. I've always felt that ASIO is THE standard C++ executor and indeed event handling and dispatch framework, so go standardise on that instead of reinventing wheels no one wants nor uses.
WAIT: The running code needs to be able to get at its EXECUTION CONTEXT in order to be able to ask it WAIT. For a given EXECUTION CONTEXT, WAIT means stop "running" - stop using CPU, etc. No further PROGRESS (which turns out to be a hard term to define, but the committee is working on that)
The key really is PROGRESS rather than wait. A future wait() or get() halts PROGRESS of the calling EXECUTION CONTEXT until the PROGRESS of some other EXECUTION CONTEXT calls promise.set_value() or set_exception(). PROGRESS applies equally to all five ways of doing tasklets above.
UNTIL: Some *other* EXECUTION CONTEXT needs to be able to tell the aforementioned EXECUTION CONTEXT to RESUME.
Personally I'd add ABORT here too. Some ASIO operations which ought to count as tasklets can be aborted, indeed so can kernel threads on POSIX. I am unsure if i/o operations ought to be included in the five types of EXECUTION CONTEXT above.
The value is orthogonal - the value is just a memory location (unless we abstract that away!) and can be set via atomic operations or whatever. Once the value is set, you then call RESUME on the execution agent. From a value point-of-view, what you want is just an ObservableValue. ie when value is changed, call this callable (function, lambda, whatever). Given an ObservableValue, and an ExecutionAgent you can build a promise.
Sounds good. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Wed, Jan 14, 2015 at 7:02 AM, Niall Douglas
On 13 Jan 2015 at 12:57, Gottlob Frege wrote:
Let's see if I've captured everything:
Let me rewrite your rewrite ...
- we want to get a value from (likely) another THREAD, and WAIT if it is not yet ready UNTIL it is ready, (whatever THREAD WAIT UNTIL mean) - a future (boost:: or std:: or hpx:: etc) is a (high-level!) wrapping of that concept - to be inter-operable we need the lower level concepts (in ALL-CAPS) exposed
THREAD: we actually don't need to completely define what "another" THREAD is, we only need a partial definition of our OWN THREAD. ie the *thread of execution* of the current code, because any code (that we assume at some point runs) runs in some "thread of execution" - whether user mode thread, kernel thread, HPX, whatever...
If you replaced the word THREAD with EXECUTION CONTEXT,
Yes, re-reading what I wrote, I think that was my point (whether planned or not) - we don't need to think about threads, but execution contexts.
where that means any of the following:
1. Kernel thread (stackful, 1:1 mapping, probably with some real hardware concurrency, cooperative context switching preferred but if too long elapses you get preempted by a timer)
2. Process thread (stackful, M:N mapping onto kernel threads, cooperative context switching though the cooperative part may be done for you by a local process runtime e.g. Fiber, ASIO, HPX, WinRT)
3. Functional/monadic call sequence (stateful though stackless, because functional call sequence is always constexpr and must only propagate state changes forwards, it can be safely suspended at any time without issue e.g. Hana, Expected, future-promise incidentally)
4. Stackless coroutine (an elaboration of item 5, but you get to write your code in a slightly less fragmented way which may aid maintainability e.g. Coroutine, ASIO's duff device macros implementing these)
5. Event callback handler (stackless, state is usually passed via a single void * or equivalent (std::function), almost always a local process runtime calls you when something happens e.g. NT Kernel APCs, POSIX signals, POSIX AIO, ASIO)
yep, that's the idea.
This may explain why I am so keen that the compiler can optimise a future-promise down to single digit opcodes. For EXECUTION CONTEXT 3 and 5 the compiler's optimiser is fully and partially available, and can collapse whole sections of code.
What we really need is access to the _controller_ of that thread of execution - ie typically the scheduler, but not a full scheduler interface. In recent C++ proposals this has been called the "execution agent", although I've been pushing for EXECUTION CONTEXT - the C++ object/interface that controls the context of the execution of the current set of instructions.
This is what the Executors concept was supposed to provide. I've always felt that ASIO is THE standard C++ executor and indeed event handling and dispatch framework, so go standardise on that instead of reinventing wheels no one wants nor uses.
WAIT: The running code needs to be able to get at its EXECUTION CONTEXT in order to be able to ask it WAIT. For a given EXECUTION CONTEXT, WAIT means stop "running" - stop using CPU, etc. No further PROGRESS (which turns out to be a hard term to define, but the committee is working on that)
The key really is PROGRESS rather than wait. A future wait() or get() halts PROGRESS of the calling EXECUTION CONTEXT
sure but how does it halt - via some kernel/system/library/... mechanism, or does it call ExecutionContext->WAIT()?
until the PROGRESS of some other EXECUTION CONTEXT calls promise.set_value() or set_exception().
until some other code does whatever it wants, setting whatever it wants, and calls ExecutionContext->RESUME() ie separate the setting the value from the resuming of the execution. (The tricky part: while still maintaining atomicity where necessary)
PROGRESS applies equally to all five ways of doing tasklets above.
UNTIL: Some *other* EXECUTION CONTEXT needs to be able to tell the aforementioned EXECUTION CONTEXT to RESUME.
Personally I'd add ABORT here too. Some ASIO operations which ought to count as tasklets can be aborted, indeed so can kernel threads on POSIX. I am unsure if i/o operations ought to be included in the five types of EXECUTION CONTEXT above.
The value is orthogonal - the value is just a memory location (unless we abstract that away!) and can be set via atomic operations or whatever. Once the value is set, you then call RESUME on the execution agent. From a value point-of-view, what you want is just an ObservableValue. ie when value is changed, call this callable (function, lambda, whatever). Given an ObservableValue, and an ExecutionAgent you can build a promise.
Sounds good.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 14 Jan 2015 at 12:04, Gottlob Frege wrote:
The key really is PROGRESS rather than wait. A future wait() or get() halts PROGRESS of the calling EXECUTION CONTEXT
sure but how does it halt - via some kernel/system/library/... mechanism, or does it call ExecutionContext->WAIT()?
You may have noticed my functional composure based futures test code
a few weeks ago. These work by taking some arbitrary sequence of
callable types (held in a tuple
until the PROGRESS of some other EXECUTION CONTEXT calls promise.set_value() or set_exception().
until some other code does whatever it wants, setting whatever it wants, and calls ExecutionContext->RESUME()
ie separate the setting the value from the resuming of the execution.
(The tricky part: while still maintaining atomicity where necessary)
The functionally composed promise futures are probably too far out for the C++ community - I suspect even if I built them no one would use them. I am grateful to the discussions here for making me realise that though. Instead in AFIO I'll implement a non-allocating afio::future<T> and move onwards. No point flogging a dead horse here, besides I've had some surprising recent success with a test implementation of a fast (32784, 32768) SECDED error correcting code in software - I can process 500Mb/sec, pretty amazing considering it's all bit work, but to avoid pipeline stalls I need to combine the implementation with a crypto hash (probably Blake2) and thus that probably has ruled out my need for much faster futures which were originally demanded by my SIMD 4-SHA256 engine. Given that CPUs next year will do SHA256 in hardware, that renders that 4-SHA256 engine obsolete. Funny really. I wrote that 4-SHA256 engine whilst I was still working with you in BlackBerry. It's been a long road from then till now. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 13 Jan 2015 at 16:52, Thomas Heller wrote:
Non-allocating futures (and by implication non reference counted shared states) have a major problem. I am basing my observation on your proposal of basic_future and basic_promise (with my assumption to have the nomenclature as in the standard). The problem is dangling pointers: https://gist.github.com/sithhell/260796afcf11364eaf26
I think you used v2 of the test code? I didn't bother with the move constructor implementation as that had been proved in v1. The point of v2 was to test the functional composure method of building up future and promise type implementations - basically, would the compiler optimiser cope? As you may have seen in the comments in v2, the answer was "not bad, though with bugs".
I can see that the problem 1 could get fixed easily by updating the
pointer the promise is pointing to ... but what about the second problem?
In v1, and I would assume any final implementation, both promise<T> and future<T> keep inline storage for a T. In problem 2, promise only takes a copy of the set value if and only if not future has never been retrieved, else promise always sends set values straight to the linked future. In problem 2, if the thread finished before you get the value from the future, it is not a problem because the promise sent to the thread sent the set value to its linked future, and so the future now holds the set value. The promise, after a future is taken from it, is no more than a dumb pointer to the future.
A portable, universal kernel wait object is not really necessary for that.
I think a portable, universal C API kernel wait object is very necessary if C++ is to style itself as a first tier systems programming language.
We keep trivialising C compatibility, and we should not.
No one is trivialising C compatibility. You can call any C code from C++.
I think I was clear that I meant the other way round. C code needing to compose waiting on C++ code to do something with other operations is a major pain. Ask anyone working in the Python C runtime and trying to work with C++ code.
That said, its design is highly limited to doing what it does because it has to make hard coded conservative assumptions about its surrounding environment. It can't support coroutines for example, and the fairness implementation does make it quite slow compared to a CAS lock because it can't know if fairness is important or not, so it must assume it is. Still, this is a price you need to pay if you want a C API which cannot take template specialisations.
So it is limited but generic? That doesn't make sense.
All C API threading primitives are the same. Right now, for example, the Linux glibc pthread mutex has a spin count set for CPUs which existed ten years ago, and is too short for modern CPUs. It is unfortunately non-adjustable from the outside, and produces suboptimal performance on modern CPUs. Limited and generic, as you said.
This is an implementation detail of a specific future island, IMHO. Aside from that, i don't want to limit myself to POSIX.
My POSIX permit object also works perfectly on Windows using the Windows condition variable API. And on Boost.Thread incidentally, I patch in the Boost.Thread condition_variable implementation. That gains me the thread cancellation emulation support in Boost.Thread and makes the boost::permit<> class fairly trivial to implement.
Please decide yourself what you want to call it ... is it POSIX now or platform indepedent?
It follows the POSIX pthreads API and naming conventions, but it compiles and works on the platforms I indicated. Windows also provides POSIX pthreads primitives, though not following the API and naming conventions. See http://msdn.microsoft.com/en-us/library/windows/desktop/ms682052%28v=v s.85%29.aspx.
Is it C or C++? boost::permit<> looks pretty C++-ish to me.
That is a thin wrap of pthread_permit_XXX(). Try https://github.com/ned14/c11-permit-object/blob/master/pthread_permit. h for a reasonably up to date version. I have a version here which does the lazy condvar allocation, but it isn't as tested as I'd like. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On January 14, 2015 7:20:43 AM EST, Niall Douglas
On 13 Jan 2015 at 16:52, Thomas Heller wrote:
We keep trivialising C compatibility, and we should not.
No one is trivialising C compatibility. You can call any C code from C++.
I think I was clear that I meant the other way round. C code needing to compose waiting on C++ code to do something with other operations is a major pain. Ask anyone working in the Python C runtime and trying to work with C++ code.
FWIW, that wasn't clear to me before now. ___ Rob (Sent from my portable computation engine)
On 14 Jan 2015 at 15:48, Rob Stewart wrote:
On January 14, 2015 7:20:43 AM EST, Niall Douglas
wrote: On 13 Jan 2015 at 16:52, Thomas Heller wrote:
We keep trivialising C compatibility, and we should not.
No one is trivialising C compatibility. You can call any C code from C++.
I think I was clear that I meant the other way round. C code needing to compose waiting on C++ code to do something with other operations is a major pain. Ask anyone working in the Python C runtime and trying to work with C++ code.
FWIW, that wasn't clear to me before now.
Oh. Okay. Sorry then. To be honest, C code only needs the ability to compose waits, that's the frustrating part because C++ is all up itself with no regard to others. For example, if a promise-future could toggle the signalled state of a file descriptor, that would enable C code to run a select() composure where the C code waits on "something to happen", which includes a C++ future becoming set. FYI the pthreads permit object I wrote has the optional facility to signal a fd when it goes signalled, so a permit object based future would be very useful to C code. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Hey Niall, On 13:33 Thu 15 Jan , Niall Douglas wrote:
To be honest, C code only needs the ability to compose waits, that's the frustrating part because C++ is all up itself with no regard to others.
I realize the value of being able to call C++ functions from C code. I'm skeptical though about designing C++ libraries so that they can export features which simply don't exist in C, if that complicates the design.
For example, if a promise-future could toggle the signalled state of a file descriptor, that would enable C code to run a select() composure where the C code waits on "something to happen", which includes a C++ future becoming set.
Wouldn't it be an acceptable workaround to have a wrapper for futures, written in C++, which can signal a FD once the future is ready? Pulling such functionality into the API of C++ futures doesn't look clean for me, especially since there are loads of use cases where a context switch is prohibitively slow. Another workaround would be to require users to compile the C function which needs to compose the waits with a C compiler.
FYI the pthreads permit object I wrote has the optional facility to signal a fd when it goes signalled, so a permit object based future would be very useful to C code.
Again, this depends on the use case. In HPC the time a system call takes is considered millennia. Consider the case when hardware is programmed by mapping memory directly into the address space of a user level program. Cheers -Andreas -- ========================================================== Andreas Schäfer HPC and Grid Computing Chair of Computer Science 3 Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany +49 9131 85-27910 PGP/GPG key via keyserver http://www.libgeodecomp.org ========================================================== (\___/) (+'.'+) (")_(") This is Bunny. Copy and paste Bunny into your signature to help him gain world domination!
On 15 Jan 2015 at 15:10, Andreas Schäfer wrote:
To be honest, C code only needs the ability to compose waits, that's the frustrating part because C++ is all up itself with no regard to others.
I realize the value of being able to call C++ functions from C code. I'm skeptical though about designing C++ libraries so that they can export features which simply don't exist in C, if that complicates the design.
Calling C++ code from C is straightforward enough, and isn't the hard part here. Well, especially if the C++ code is noexcept at least. The problem is rather like "future islands", except that all of C++ is a giant future island to C code.
For example, if a promise-future could toggle the signalled state of a file descriptor, that would enable C code to run a select() composure where the C code waits on "something to happen", which includes a C++ future becoming set.
Wouldn't it be an acceptable workaround to have a wrapper for futures, written in C++, which can signal a FD once the future is ready? Pulling such functionality into the API of C++ futures doesn't look clean for me, especially since there are loads of use cases where a context switch is prohibitively slow.
Another workaround would be to require users to compile the C function which needs to compose the waits with a C compiler.
Remember that almost certainly the C code is working with a set of libraries it does not own nor control e.g. the Python runtime. If the C code would like to sleep the process until something happens, right now you fire off worker threads which wait on each of the third party libraries and the sole purpose of the worker threads is to signal your unified wait composure implementation when the third party library unblocks. This sucks. I think the future.then() facility in the Concurrency TS at least allows C code to hook future state changes with some C callback, so for example it could write a byte to some fd to get a select() call to wake. The only real problem is the potential variance in when the continuation gets called, but I'd assume the programmer writing the C hook can figure that out. Again though, it's unfortunate there isn't a universal wait object in C also used by C++. That pthreads permit object of mine allows composure i.e. wait_for_all() and wait_for_any(), albeit with O(N) scaling unfortunately, but then at least C code could assemble a list of event states to watch for change and sleep until change occurs. You can of course do this with file descriptors, but as you mentioned that is very slow compared to a pure userspace wait object. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
The easiest way to deal with this is to introduce a Future concept and implement everything in terms of it. A solid set of traits/concepts-lite should cover that.
I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not.
Relying on the kernel to do threading is plain slow - at least with what we usually have today.
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at.
You don't need to - as long as the interfaces exposed by that concept are powerful enough to handle things, that is.
A good example is the proposed await 2.0 (see N4134: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4134.pdf), btw. The proposed await keyword can be adapted to handle arbitrary future types using a user supplied trait. We tried that with hpx::future and it seems to work fine (once they fix a compiler bug which prevented us from doing large scale tests).
N4134 presents excellent progress, apart from one major thing I disliked about it: I deeply dislike magic compiler tricks which fold the allocation of shared state by a future promise onto the stack.
You don't have to do that, it's just one possible way of implementing the trait.
Far, far better to fix the present future promise API to stop requiring shared state and therefore memory allocation at all.
Not sure you can implement a shared_future without a shared state.
2. Every time you touch them with change you unavoidably spend thousands of CPU cycles due to going through the memory allocator and (effectively) the internal shared_ptr. This makes using futures for a single SHA round, for example, a poor design despite how nice and clean it is.
As long as the overheads of managing the future itself are much smaller than the overheads introduced by the underlying threading system we're fine. And for std::future and boost::future this is definitely the case as both are tied to kernel-threads. In HPX this is a bigger problem as the overheads introduced by futures are comparable with those of the underlying threading system (sub-microsecond). However in our experience this is solvable (things like special allocators for the shared state and using intrusive_ptr for it come to mind).
I think a one size fits all future is a fundamentally flawed approach.
Did I say 'one size fits all'? I don't think so.
3. They force you to deal with exceptions even where that is not appropriate, and internally most implementations will do one or more internal throw-catches which if the exception type has a vtable, can be particularly slow.
The implementations will throw only if there is an error. This is a no- issue for the non-exceptional case. And I personally don't care if the exceptional case is slow (involves things like logging anyways, etc.).
That's not the problem. The problem is that the compiler cannot know if no exception will ever be generated and therefore has to generate the opcodes anyway. What I'm really asking for is a "noexcept future" such that this sequence:
promise<int> p; auto f(p.get_future()); p.set_value(5); return f.get();
... can be optimised by the compiler into:
_Z5test1v: # @_Z5test1v .cfi_startproc # BB#0: # %_ZN7promiseIiJEED2Ev.exit2 movl $5, %eax ret
Obviously this is an unrealistic use case, but my point is that the compiler should be capable of such a reduction because the correct design of future promise wouldn't get in the way.
My earlier proposal for non-allocating future promise doesn't even transport a value it's so basic (hence "basic_future" and "basic_promise"). It pushes that responsibility onto something like expected
where if the programmer chooses, E can be a std::exception_ptr in which case you get exception capable futures. Or or E can be a std::error_code, in which case the future is now noexcept and the compiler can completely elide all related exception handling machinery. Obviously one could use an expected
, exception_ptr> too. In fact, this is exactly what I need for AFIO and is why I started down this rabbit hole at all. Personally speaking, I have better things to be doing than working on futures, but right now they are a major showstopper for me, particularly for efficient SHA hashing.
Sure, let's do that. It will make futures more efficient, however does not solve you initial concern of 'future islands' as it introduces yet another future type into the mix.
To that end, the non-allocating basic_future toolkit I proposed on this list before Christmas I think has the best chance of "fixing" futures. Each programmer can roll their own future type, with optional amounts of interoperability and composure with other future islands. Then a future type lightweight enough for a SHA round is possible, as is some big thick future type providing STL future semantics or composure with many other custom future types. One also gains most of the (static) benefits of ASIO's async_result, but one still has ABI stability.
Non-allocating futures are a step in the right direction. But even those require to solve some of the problems you mentioned. Otherwise they will make the issue of having future-islands just a bit bigger...
Eliminating future islands is, I suspect, not something the C++ community can entirely do alone. We are, as a minimum, going to have to petition POSIX for improved runtime support. We probably ought to have our ducks in a row before that though.
*shiver* I'd rather embrace future islands as we will not be able to solve this. Let's use the facilities available to us (C++!) and solve it inside the language/library. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
On 5 Jan 2015 at 7:52, Hartmut Kaiser wrote:
I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not.
Relying on the kernel to do threading is plain slow - at least with what we usually have today.
It's also often the only way to do parallelism in the kernel if yours is not microkernel, so if you need that then you're stuck. File i/o in particular (or rather, the kernel file page cache) is often optimised for kernel threading. Also as I mentioned in the other email to Thomas it's usually the case that all waits need to have the potential to go to a kernel wait, even if rarely used.
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at.
You don't need to - as long as the interfaces exposed by that concept are powerful enough to handle things, that is.
With C code as well? A large chunk of newly written systems code is still in C. And I don't expect that to change. C++ already spends too much time in its compile time ivory tower and not attending to its main use case, systems programming.
Far, far better to fix the present future promise API to stop requiring shared state and therefore memory allocation at all.
Not sure you can implement a shared_future without a shared state.
shared_future is a particularly broken design. What should have been
implemented is this:
promise
Sure, let's do that. It will make futures more efficient, however does not solve you initial concern of 'future islands' as it introduces yet another future type into the mix.
Correct. My proposal is a "future island factory".
Eliminating future islands is, I suspect, not something the C++ community can entirely do alone. We are, as a minimum, going to have to petition POSIX for improved runtime support. We probably ought to have our ducks in a row before that though.
*shiver*
I'd rather embrace future islands as we will not be able to solve this. Let's use the facilities available to us (C++!) and solve it inside the language/library.
POSIX is modifiable. We basically need a use case to persuade the libc maintainers to incorporate the pthreads permit object into their libc's, and after that the AWG cannot prevent its eventual standardisation. Just to be clear, the pthreads permit object would be the absolutely lowest layer object. Most threading implementations in C++ ought to never need to reach that object most of the time.
I am personally surprised that Chris hasn't proposed this yet in one of this N-papers proposing the ASIO way of doing async instead of the current approach by the committee :)
Again, I don't see the 'current way' and 'Chris' way' as contradicting. They are orthogonal and address different things.
As a way of sending values between threads, yes they are orthogonal. As a way of end user code getting a handle to some async operation it can poll or wait upon, they are commensurate, and Chris' way has the big advantage of already being standard practice in ASIO, being portable, and being available now and not in 2019 or later. Most C++ programmers just want something which solves future islands or isn't crazy inefficient for small grained operations right now, and Chris' way delivers that right now. I can't speak for Chris, but his N-papers to WG21 read to me as him effectively saying that the approach currently taken by the Concurrency TS is fundamentally unwise given that the battle tested ASIO approach is already standard practice, doesn't require magic compiler support, is known to be highly flexible and efficient and as ASIO is going to become the Networking TS, it is implied is a superior approach for concurrency in C++. For the record, I don't agree with that assessment, but he makes the especially good point that the present Concurrency TS is currently appearing to ignore the exigencies imposed by the likely Networking TS. The two idioms need to be reconciled properly into something which hangs well together, else we're going to see a dog's breakfast of concurrency support in C++ with threading and networking taking dimorphic concurrency paradigms. Obviously Hartmut you're on the appropriate committees and so are privy to details I am not, so it may be you are already working on this reconciliation, if so I applaud it. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Le 05/01/15 12:01, Niall Douglas a écrit :
On 4 Jan 2015 at 9:52, Hartmut Kaiser wrote:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design. The easiest way to deal with this is to introduce a Future concept and implement everything in terms of it. A solid set of traits/concepts-lite should cover that. I don't think it's that easy because really it comes down to commonality of kernel wait object, or rather, whether one has access to the true underlying kernel wait object or not.
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at. The current implementation doesn't accept std::futures, but there is no reason it can not accept other futures. All what is needed is the Future::wait() interface.
wait_for_any is different. The Boost thread implementation uses a list of condition_variable to notify when a the future becomes ready. Having a generic future<T>::notify_when_ready(condition_variable) will surely help. Best, Vicente
On 5 Jan 2015 at 22:41, Vicente J. Botet Escriba wrote:
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at. The current implementation doesn't accept std::futures, but there is no reason it can not accept other futures. All what is needed is the Future::wait() interface.
wait_for_any is different. The Boost thread implementation uses a list of condition_variable to notify when a the future becomes ready. Having a generic future<T>::notify_when_ready(condition_variable) will surely help.
I would *far* prefer a notify_when_ready(callable), not least because condition_variables are lost wakeup prone. But then you're effectively making futures into ASIO async_result. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 5 Jan 2015 at 22:41, Vicente J. Botet Escriba wrote:
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at. The current implementation doesn't accept std::futures, but there is no reason it can not accept other futures. All what is needed is the Future::wait() interface.
wait_for_any is different. The Boost thread implementation uses a list of condition_variable to notify when a the future becomes ready. Having a generic future<T>::notify_when_ready(condition_variable) will surely help. I would *far* prefer a notify_when_ready(callable), not least because condition_variables are lost wakeup prone. It is much easier to store a condition_variable than a Callable. In addition it ensures that the future value provider will not block until
Le 06/01/15 11:23, Niall Douglas a écrit : the callback finish and makes the user code thread safe, as the code is executed on the thread of its choice.
But then you're effectively making futures into ASIO async_result.
I'm not a fan of async_result, as the way the function is used depends on a specific parameter. IMO, we need different functions when the user must follows a different protocol. Best, Vicente
Le 06/01/15 20:22, Vicente J. Botet Escriba a écrit :
Le 06/01/15 11:23, Niall Douglas a écrit :
On 5 Jan 2015 at 22:41, Vicente J. Botet Escriba wrote:
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at. The current implementation doesn't accept std::futures, but there is no reason it can not accept other futures. All what is needed is the Future::wait() interface.
wait_for_any is different. The Boost thread implementation uses a list of condition_variable to notify when a the future becomes ready. Having a generic future<T>::notify_when_ready(condition_variable) will surely help. I would *far* prefer a notify_when_ready(callable), not least because condition_variables are lost wakeup prone. It is much easier to store a condition_variable than a Callable. In addition it ensures that the future value provider will not block until the callback finish and makes the user code thread safe, as the code is executed on the thread of its choice. But then you're effectively making futures into ASIO async_result.
I'm not a fan of async_result, as the way the function is used depends on a specific parameter. IMO, we need different functions when the user must follows a different protocol.
After some more thoughts, the callable interface is more open. The condition_variable interface could be something like // Return an unlocked LockableFutureHandle able to tell if the Future is ready once the lock has been locked. LockableFutureHandle Future::notify_when_ready(condition_variable&); // pre-condition this is not locked void LockableFutureHandle::lock(); // pre-condition this is locked void LockableFutureHandle::unlock(); // pre-condition this is locked void LockableFutureHandle::is_ready(); This seems a little bit intrusive and is needed as the user need to check which future is ready. Using a Callable (void()) like e.g. template <class Callable> voidFuture::when_ready(Callable&&); that doesn't consume the future, the user is able to store on the callable closure the mutex, the condition_variable and the index. When called it can store the index as the one that is ready and notify the condition variable. An alternative could be to have some kind of lockable condition variable wrapping an index. The wait function could return the index. voidFuture::notify_when_ready(LockableConditionVariablestd::size_t&); The user would call it as follows LockableConditionVariablestd::size_t lcvi(i); f.notify_when_ready(lcvi); and will later call to wait to get the index. std::size_t index = lcvi.wait(); Best, Vicente
Le 06/01/15 20:22, Vicente J. Botet Escriba a écrit :
Le 06/01/15 11:23, Niall Douglas a écrit :
On 5 Jan 2015 at 22:41, Vicente J. Botet Escriba wrote:
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at. The current implementation doesn't accept std::futures, but there is no reason it can not accept other futures. All what is needed is the Future::wait() interface.
wait_for_any is different. The Boost thread implementation uses a list of condition_variable to notify when a the future becomes ready. Having a generic future<T>::notify_when_ready(condition_variable) will surely help. I would *far* prefer a notify_when_ready(callable), not least because condition_variables are lost wakeup prone. It is much easier to store a condition_variable than a Callable. In addition it ensures that the future value provider will not block until the callback finish and makes the user code thread safe, as the code is executed on the thread of its choice. But then you're effectively making futures into ASIO async_result.
I'm not a fan of async_result, as the way the function is used depends on a specific parameter. IMO, we need different functions when the user must follows a different protocol.
After some more thoughts, the callable interface is more open.
The condition_variable interface could be something like
// Return an unlocked LockableFutureHandle able to tell if the Future is ready once the lock has been locked. LockableFutureHandle Future::notify_when_ready(condition_variable&);
// pre-condition this is not locked void LockableFutureHandle::lock();
// pre-condition this is locked void LockableFutureHandle::unlock();
// pre-condition this is locked void LockableFutureHandle::is_ready();
This seems a little bit intrusive and is needed as the user need to check which future is ready.
Using a Callable (void()) like e.g.
template <class Callable> voidFuture::when_ready(Callable&&);
that doesn't consume the future, the user is able to store on the callable closure the mutex, the condition_variable and the index. When called it can store the index as the one that is ready and notify the condition variable.
An alternative could be to have some kind of lockable condition variable wrapping an index. The wait function could return the index.
voidFuture::notify_when_ready(LockableConditionVariablestd::size_t&);
The user would call it as follows
LockableConditionVariablestd::size_t lcvi(i); f.notify_when_ready(lcvi);
and will later call to wait to get the index.
std::size_t index = lcvi.wait();
Whatever you do, I'd suggest not to tie the API to any of the kernel-based synchronization objects. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
Le 08/01/15 02:36, Hartmut Kaiser a écrit :
Le 06/01/15 20:22, Vicente J. Botet Escriba a écrit :
Le 06/01/15 11:23, Niall Douglas a écrit :
On 5 Jan 2015 at 22:41, Vicente J. Botet Escriba wrote:
For example, right now can boost::wait_all() ever consume std::futures? I suspect not because the HANDLE on Windows or the futex on Linux is rather hard to get at. The current implementation doesn't accept std::futures, but there is no reason it can not accept other futures. All what is needed is the Future::wait() interface.
wait_for_any is different. The Boost thread implementation uses a list of condition_variable to notify when a the future becomes ready. Having a generic future<T>::notify_when_ready(condition_variable) will surely help. I would *far* prefer a notify_when_ready(callable), not least because condition_variables are lost wakeup prone. After some more thoughts, the callable interface is more open.
The condition_variable interface could be something like
// Return an unlocked LockableFutureHandle able to tell if the Future is ready once the lock has been locked. LockableFutureHandle Future::notify_when_ready(condition_variable&);
<snip>
template <class Callable> voidFuture::when_ready(Callable&&);
<snip>
voidFuture::notify_when_ready(LockableConditionVariablestd::size_t&);
<snip> Whatever you do, I'd suggest not to tie the API to any of the kernel-based synchronization objects.
Is a condition_variable a kernel synchronization object in HPX? Best, Vicente
Am 08.01.2015 03:21 schrieb "Vicente J. Botet Escriba" < vicente.botet@wanadoo.fr>:
Le 08/01/15 02:36, Hartmut Kaiser a écrit :
<snip>
Whatever you do, I'd suggest not to tie the API to any of the kernel-based synchronization objects.
Is a condition_variable a kernel synchronization object in HPX?
No it's not. Just a side note: we even try to avoid asio as much as possible due to it's kernel based synchronization (a context switch into the kernel is just too much for a low latency, high bandwidth network).
Best, Vicente
_______________________________________________ Unsubscribe & other changes:
On 8 Jan 2015 at 6:43, Thomas Heller wrote:
Is a condition_variable a kernel synchronization object in HPX?
No it's not.
Just a side note: we even try to avoid asio as much as possible due to it's kernel based synchronization (a context switch into the kernel is just too much for a low latency, high bandwidth network).
Technically speaking, ASIO's design highly avoids kernel synchronisation, and on Windows does a very good job of also avoiding context switching. I agree the POSIX implementation of io_service is not great, though that's a pure quality of implementation issue rather than a design problem. Chris is aware of this problem. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 13/01/2015 04:48, Niall Douglas wrote:
On 8 Jan 2015 at 6:43, Thomas Heller wrote:
Is a condition_variable a kernel synchronization object in HPX?
No it's not.
Just a side note: we even try to avoid asio as much as possible due to it's kernel based synchronization (a context switch into the kernel is just too much for a low latency, high bandwidth network).
Technically speaking, ASIO's design highly avoids kernel synchronisation, and on Windows does a very good job of also avoiding context switching.
It still has excessive use of mutual exclusion locks for some workloads. Replacing these with lock-free data structures causes noticeable improvements.
On 7 Jan 2015 at 19:56, Vicente J. Botet Escriba wrote:
Using a Callable (void()) like e.g.
template <class Callable> voidFuture::when_ready(Callable&&);
I don't like futures having to store a list of callables to call when set_value() or set_exception() is called. It means futures must allocate memory. I think only the wait() call should allocate memory where possible. After all, when you call wait() on a list of futures, you expect it to be an expensive operation. Also, instead of when_ready() it should be when_set() to emphasise it will be called in the thread setting the state. How about an API which asks a future to atomically insert itself into a chain of linked list pointers maintained by the future returned from the wait() call? That lets the source future get destroyed without having its shared state linger or indeed having any shared state at all, and the unlink operation ought to be very fast. The destination future coming out of a wait_for_all() or wait_for_any() I think can never avoid being a big and fat future, especially if it accepts heterogeneous future types. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 4/01/2015 19:52, Niall Douglas wrote:
On 3 Jan 2015 at 7:15, Hartmut Kaiser wrote:
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
They are also severely limited and limiting:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
Is there somewhere I can read up some more on this issue? Perhaps I have not put enough thought into it, but it seems like it should be trivial to create a wrapper to convert a std::future into a boost::future or the reverse. (It's slightly less trivial if you want to use a mixture of futures and shared_futures, but it should still be manageable.) Of course there are likely to be some performance costs to doing so, so it's not something you want to be doing in tight loops, but then I'm not convinced async in general is something you should be doing in tight loops. If it's only happening at the intersection point between libraries then it ought to be ok, although of course it depends on how much overlap is required by the application.
On 16 Jan 2015 at 18:07, Gavin Lambert wrote:
1. They tie your code into "future islands" which are fundamentally incommensurate with all code which doesn't use the same future as your code. Try mixing code using boost::future and std::future for example, it's a nightmare of too easy to be racy and unmaintainable mess code. If Compute provided a boost::compute::future, it would yet another new future island, and I'm not sure that's wise design.
Is there somewhere I can read up some more on this issue?
I can't think of anywhere except this thread really. I guess anyone who has tried mashing together different types of futures knows what a future island is immediately.
Perhaps I have not put enough thought into it, but it seems like it should be trivial to create a wrapper to convert a std::future into a boost::future or the reverse. (It's slightly less trivial if you want to use a mixture of futures and shared_futures, but it should still be manageable.)
Indeed a wrapper is trivial. The problem is where library A uses boost::future and library B uses std::future and your code has to work with both. In case you think that probably okay for open source libraries, imagine mixing up a precompiled binaries compiled against different MSVCRT.DLL where "std::future" isn't the same thing. A codebase I currently work with uses both types of future in the same codebase. We use std::async to work around it a lot unfortunately. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Le 03/01/15 14:15, Hartmut Kaiser a écrit :
Thomas Heller
writes: Well, that's exactly what I am trying to say ... The current design of the library completely disregards the research that has been done to support asynchronous operations. We have std::future (which is almost equivalent to a OpenCL event), why not use the same mechanisms here? First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others. +1 This is something Joel tries to convince me of but I'm resisting. Could you shed some light on how events are almost equivalent to futures? Futures store the result of the asynchronous computation. Events are markers that can be queried to find out an operation finished and can be blocked on until an operation is finished. The data however is stored somewhere else. Futures are in this sense safer abstractions as they prevent users from accessing results that are not yet finished. That is my understanding of futures, I might be wrong here, please correct me if I am.
So I consider futures and events orthogonal concepts. One can be, with some effort and loss of expressiveness, changed to the other concept and vice versa. But I'm not sure if the code makes sense after the change. Consider these examples:
future<void> f = copy_async(src, dst); fill_async(dst, 42.);
This does not work, a dependency or dataflow graph has to be created between copy and fill, so:
future<void> f = copy_async(src, dst); fill_async(dst, 42., f); What about:
future<void> f = copy_async(src, dst); f.then([](future<void>&& f) { fill_async(dst, 42.); })
or (assuming await will be available, which almost all of the committee thinks is something we need):
await copy_async(src, dst); fill_async(dst, 42.);
i.e. the code looks 'normal' but is fully asynchronous thanks to await and futures.
But that is not a future, that is an event. How to write this with futures?
I think it should be this but I might be wrong:
futuredst::iterator f = copy_async(src, dst); fill_async(f, 42); You're right that an event is separating the fact that data is available from the data itself. Well, the opencl guys decided that this is the right way of doing things. I really hope that we know better. Just because the underlying opencl API exposes the trigger and the data separately does not imply that we have to do the same thing in the API exposed from our libraries. At the same time and as you already mentioned, future<void> is perfectly well usable for representing even this use case.
As std::copy returns the OutputIterator copy_async should return a
future<OutputIterator>. But this iterator is not dst.
futuredst::iterator it = copy_async(src, dst);
it.then([](futuredst::iterator&&) { fill_async(dst, 42.); })
An alternative could be a duplicate_async algorithm that returns a
future for a copy of the elements in a new constructed Container.
future<Container> dst = duplicate_async<Container>(src);
dataflow(fill_async, dst, 42);
With await this would become
auto dstf = await duplicate_async<Container>(src);
dataflow(fill_async, dstf, 42);
Would it be possible to implement this duplicate_async with OpentCL?
Would this interface be much more inefficient?
If this seems too expensive, another alternative could be add also the
dst parameter to the duplicate-async function.
future
[1] https://github.com/STEllAR-GROUP/hpx [2] https://www.youtube.com/watch?v=4OCUEgSNIAY
Thanks for the link to the presentation. Happy new year to all of you, Vicente
On Sunday, January 04, 2015 11:26:15 Vicente J. Botet Escriba wrote:
Le 03/01/15 14:15, Hartmut Kaiser a écrit :
Thomas Heller
writes: Well, that's exactly what I am trying to say ... The current design of
the
library completely disregards the research that has been done to support asynchronous operations. We have std::future (which is almost equivalent
to a
OpenCL event), why not use the same mechanisms here?
First of all, I fully support Thomas here. Futures (and the extensions proposed in the 'Concurrency TS') are a wonderful concept allowing asynchronous computation. Those go beyond 'classical' futures, which just represent a result which has not computed yet. These futures allow for continuation style coding as you can attach continuations and compose new futures based on logical operations on others.
+1
This is something Joel tries to convince me of but I'm resisting. Could you shed some light on how events are almost equivalent to futures? Futures store the result of the asynchronous computation. Events are markers that can be queried to find out an operation finished and can be blocked on until an operation is finished. The data however is stored somewhere else. Futures are in this sense safer abstractions as they prevent users from accessing results that are not yet finished. That is my understanding of futures, I might be wrong here, please correct me if I am.
So I consider futures and events orthogonal concepts. One can be, with some effort and loss of expressiveness, changed to the other concept and vice versa. But I'm not sure if the code makes sense after the change. Consider these examples:
future<void> f = copy_async(src, dst); fill_async(dst, 42.);
This does not work, a dependency or dataflow graph has to be created between copy and fill, so:
future<void> f = copy_async(src, dst); fill_async(dst, 42., f);
What about: future<void> f = copy_async(src, dst); f.then([](future<void>&& f) { fill_async(dst, 42.); })
or (assuming await will be available, which almost all of the committee
thinks is something we need): await copy_async(src, dst); fill_async(dst, 42.);
i.e. the code looks 'normal' but is fully asynchronous thanks to await and futures.
But that is not a future, that is an event. How to write this with futures?
I think it should be this but I might be wrong:
futuredst::iterator f = copy_async(src, dst); fill_async(f, 42);
You're right that an event is separating the fact that data is available from the data itself. Well, the opencl guys decided that this is the right way of doing things. I really hope that we know better. Just because the underlying opencl API exposes the trigger and the data separately does not imply that we have to do the same thing in the API exposed from our libraries. At the same time and as you already mentioned, future<void> is perfectly well usable for representing even this use case.
As std::copy returns the OutputIterator copy_async should return a future<OutputIterator>. But this iterator is not dst.
futuredst::iterator it = copy_async(src, dst); it.then([](futuredst::iterator&&) { fill_async(dst, 42.); })
An alternative could be a duplicate_async algorithm that returns a future for a copy of the elements in a new constructed Container.
future<Container> dst = duplicate_async<Container>(src); dataflow(fill_async, dst, 42);
With await this would become
auto dstf = await duplicate_async<Container>(src); dataflow(fill_async, dstf, 42);
Would it be possible to implement this duplicate_async with OpentCL? Would this interface be much more inefficient?
If this seems too expensive, another alternative could be add also the dst parameter to the duplicate-async function.
future
dstf = duplicate_async(src, dst); dataflow(fill_async, dstf, 42); BTW Hartmut, do you plan to propose to the standard the dataflow function?
We thought about it, and it would absolutely make sense to make std::async complete. The implementation of dataflow is almost exactly the one out of the await proposal. That being said, it's a poor man's await and await is way more elegant and to be preferred.
[1] https://github.com/STEllAR-GROUP/hpx [2] https://www.youtube.com/watch?v=4OCUEgSNIAY
Thanks for the link to the presentation.
Happy new year to all of you, Vicente
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 3 Jan 2015 at 12:20, Sebastian Schaetz wrote:
This is something Joel tries to convince me of but I'm resisting. Could you shed some light on how events are almost equivalent to futures? Futures store the result of the asynchronous computation.
A shared_future is to a future as a win32 manual reset event is to a win32 auto reset event. The value transport in a future is of course illusory. Simply think purely in terms of future<void>. The key part, and where futures strongly resemble events, is in the signal wait-release part. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
participants (15)
-
Andreas Schäfer
-
Bjorn Reese
-
Gavin Lambert
-
Giovanni Piero Deretta
-
Gottlob Frege
-
Gruenke,Matt
-
Hartmut Kaiser
-
Mathias Gaunard
-
Niall Douglas
-
Paul A. Bristow
-
Rob Stewart
-
Sebastian Schaetz
-
Sylvester-Bradley, Gareth
-
Thomas Heller
-
Vicente J. Botet Escriba