Re: [boost] Futures (was: Re: [compute] Some remarks)

6 Jan 2015

      On 5 Jan 2015 at 12:49, Thomas Heller wrote:
...
...
I don't think it's that easy because really it comes down to
commonality of kernel wait object, or rather, whether one has access
to the true underlying kernel wait object or not.
You make the assumption that you only ever synchronize on kernel space 
objects. This is not at all required nor necessary.
I make the assumption that one _eventually_ synchronises on kernel 
wait objects, and I also assume that you usually need the ability to 
fall back onto a kernel wait in most potential wait scenarios (e.g. 
if no coroutine work is pending, and there is nothing better to do 
but sleep now). One could I suppose simply call yield() all the time, 
but that is battery murder for portable devices.

What is missing on POSIX is a portable universal kernel wait object 
used by everything in the system. It is correct to claim you can 
easily roll your own with a condition variable and an atomic, the 
problem comes in when one library (e.g. OpenCL) has one kernel wait 
object and another library has a slightly different one, and the two 
cannot be readily composed into a single wait_for_all() or 
wait_for_any() which accepts all wait object types, including 
non-kernel wait object types.

Windows does have such a universal kernel wait object (the event 
object). And on POSIX you could inefficiently emulate a universal 
kernel wait object using a pipe at the cost of two file descriptors 
per object, though directly using a futex on Linux would be cheaper.

On 5 Jan 2015 at 13:49, Thomas Heller wrote:
...
...
No it isn't. Current futures require the compiler to generate the
code for handling exception throws irrespective of whether it 
could
ever happen or not. As a relative weight to something like a SHA
round which is fundamentally noexcept, this isn't a trivial 
overhead
especially when it's completely unnecessary.
Ok. Hands down: What's the associated overhead you are talking 
about? Do you 
have exact numbers?
I gave you exact numbers: a 13% overhead for a SHA256 round.
...
The problem with async_result (as mentioned in a different post) is that
it merely takes care of "transporting" from the ASIO future island to
another one. It can be just as well be adapted to any other future based
system.
...
...
Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022. 
The
key insight of that proposal is the notion of static composition 
of
continuations as the core design. One then composes, at 
compile-time,
a sequence of continuations which implement any combination and
variety of future you like, including the STL ones and the
Absolutely. Which is precisely why it's a very viable alternative to 
fiddling with futures. Most programmers couldn't give a toss about 
whether futures do this or that, they do care when they have to jump 
through hoops because library A is in a different future island to 
library B.

Chris' async_result approach makes that go away right now, not in 
2019 or later. It's a very valid riposte to the Concurrency TS, and 
unlike the Concurrency TS his approach is portable and is already 
standard practice instead of invention of standards by mostly 
Microsoft.

proposed
...
...
Concurrency TS ones. You will note how the functional static
continuations are effectively monadic, and therefore these 
elementary
future promises are actually a library based awaitable resumable
monadic toolkit which could be used to write coroutine based Hana 
or
Expected monadic sequences which can be arbitrarily paused, 
resumed,
or transported across threads.
This looks indeed promising. I think we should further investigate how
this could be used when dealing with truly asynchronous and concurrently
executed tasks.
...
...
Universal composure of any kind of future with any other kind is
possible when they share the same underlying kernel wait object. 
I
intend to use my proposed pthreads permit object which is a
For me it's a question of free time. This is stuff I do for only a 
few hours per week because this time is unfunded (happy to discount 
my hourly rate for anyone wanting to speed these up!), and right now 
my priority queue is:

1. Release BindLib based AFIO to stable branch (ETA: end of January).
2. Get BindLib up to Boost quality, and submit for Boost review (ETA: 
March/April).
3. C++ Now 2015 presentation (May).
4a. Non-allocating lightweight future promises extending Expected 
(from June onwards).
4b. Google Summer of Code mentoring of concurrent_unordered_map so it 
can be finished and submitted into Boost.

That's the best I can do given this is unfunded time.

portable
...
...
userspace pthreads event object as that universal kernel wait 
object.
If widely adopted, it may persuade the AWG to admit permit 
objects
into POSIX threads for standardisation, that way C and C++ code 
can
all use interoperable wait composure.
Indeed, if POSIX threads already had the permit object, then 
OpenCL
would have used it instead of making their custom event object, 
and
we could then easily construct a std::future and boost::future 
for
Compute. Sadly, the AWG don't see this sort of consequence, or 
rather
I suspect they don't hugely care.
You make the assumption that OpenCL merely exist on the host.
No, it's more I'm limiting the discussion to host-only and indeed 
kernel threading only. I might add that I took care in my pthreads 
permit object design that it works as expected without a kernel being 
present so it can be used during machine bootstrap, indeed you can 
create a pthreads permit object which only spins and yields. That 
object design is entirely capable of working correctly under 
coroutines too, or on a GPU. It's a C API abstraction of some ability 
for one strand to signal another strand, how that is actually 
implemented underneath is a separate matter.
...
They could
just as well be containing device side specific information which is
then be used directly on the device (no POSIX there). BTW, this is just
one example where your assumption about kernel level synchronization is
wrong. Another scenario is in coroutine like systems like HPX where you
have different synchronization primitives (Boost.Fiber would be another
example for that). And this is exactly where the challenge is: Trying to
find a way to unify those different synchronization mechanisms. That
way, we could have a unified future interface. The things you proposed
so far can be a step in that direction but certainly don't include all
necessary requirements.
Actually this is the exact basis for my argument regarding many 
future types, and creating a library which is a factory for future 
types. In C++ in a proper design we only pay for what we use, so a 
future suitable for a SHA round needs to be exceptionally 
lightweight, and probably can't copy-compose at all but can 
move-compose (this is where a newly created future can atomically 
destroy its immediately preceding future, and therefore a 
wait_for_all() on an array of such lightweight futures works as 
expected). Meanwhile a future which can be used across processes 
concurrently would be necessarily a far heavier and larger object.

The same applies to coroutine parallelism, or HPX, or WinRT. They all 
get families of future types best suited for the task at hand, and if 
the programmer needs bridges across future islands then they pay for 
such a facility. The cost is, as Harmut says, a multiplication of 
future islands, but I believe that is inevitable anyway, so one might 
as well do it right from the beginning.

I might add that BindLib lets the library end user choose what kind 
of future the external API of the library uses. Indeed BindLib based 
AFIO lets you choose between std::future and boost::future, and 
moreover you can use both configurations of AFIO in the same 
translation unit and it "just works". I could very easily - almost 
trivially - add support for a hpx::future in there, though AFIO by 
design needs kernel threads because it's the only way of generating 
parallelism in non-microkernel operating system kernels (indeed, the 
whole point of AFIO is to abstract that detail away for end users).

This is why I'd like to ship BindLib sooner rather than later. I 
believe it could represent a potential enormous leap forward for the 
quality and usability of C++ 11 requiring Boost libraries.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] Futures (was: Re: [compute] Some remarks)

Niall Douglas