Non-allocating constexpr functional continuations-based future-promise

23 Oct 2014

      Dear Boost,

Given Hartmut's feedback from my previous attempt at a non-allocating 
future-promise implementation, I now have a second, much more generic 
demonstration prototype which is based on heterogenous sequences of 
constexpr functional calls. I think this design may have huge potential.

Consider some heterogenous sequence of callables C { f1, f2, f3, f4, ... } 
such that fN(... f4(f3(f2(f1(Args...))))) -> R i.e. if you call the 
heterogenous sequence with arguments Args..., f1(args...) will be called 
and fed to f2() which then is fed to f3() up to fN() whose return type is 
R. If I understand proposed Boost.Hana correctly, these potentially 
constexpr sequences of heterogeneous callables are the main bridge between 
pure compile-time heterogeneous type sequences and doing something with 
them in real code as they can decay on demand into actual code when needed. 
And for proposed Boost.Expected, its monadic programming sequences bind(), 
map() and then() all functionally operate on some expected<T, E>, consuming 
an instance and returning new instances, so these monadic sequences are 
also potentially constexpr sequences of heterogenous callables.

Now, what new basic_future-basic_promise will do is to allow you to insert 
a suspend-resume of execution into some functional sequence C, so 
essentially:

split<idx>(C) => [ basic_promise<X>, basic_future<Y> ]

... where X is some split front part of C and Y is some back part of C. You 
can now execute the first part of C via the call operator:

basic_promise<X>(Args...)

Execution of the C0...Cn making up X occurs, but instead of continuing into 
Y it stores the result of Cn and makes that available to the corresponding 
basic_future. At some later point you can resume the execution of C:

basic_future<Y>() -> R

... and thus complete the execution of the functional sequence C, returning 
the same value R as invoking the original functional sequence C with the 
same input parameters.

For those who prefer to understand theory via code, here is some example 
C++ 14 code:

  // Let this be an arbitrarily long sequence of functional callables
  auto ct=make_continuations([](auto x){return 5+x;}, [](auto x){ return 
x+6;});
  {
    // Make a future promise pair for if we called c(int)
    auto fp=ct.suspend_resume<1, int>();
    // Execute promise
    fp.first(1);
    // Execute future
    b=std::move(fp.second)();
  }
  {
    // Make a future promise pair for if we called c(double)
    auto fp=ct.suspend_resume<1, double>();
    // Execute promise
    fp.first(1.0);
    // Execute future
    d=std::move(fp.second)();
  }
  // b=12, d=12.0

There are a multitude of uses for this very fundamental pattern. Let us try 
implementing something closely resembling the Concurrency TS's dynamic 
continuations support .then(launch, callable). Let us define this 
functional continuation:

  template<class T> struct then_continuation
  {
    std::vector<std::function<void(T)>> _conts;
    template<class U> auto then(U &&c)
    {
      typedef decltype(c(std::declval<T>())) result_type;
      promise<result_type> p;
      future<result_type> ret(p.get_future());
      // Unfortunately libstdc++'s std::function currently always tries to 
copy lambda types
      //_conts.emplace_back([p=std::move(p), c=std::forward<U>(c)](T 
v){p.set_value(c(v));});
     _conts.push_back([p = 
std::make_shared<promise<result_type>>(std::move(p)), 
c=std::forward<U>(c)](T v){p->set_value(c(v));});
      return ret;
    }
    BOOST_CONSTEXPR T operator()(T v) const
    {
      for(auto &i : _conts)
        i(v);
      return v;
    }
  };

All this continuation does is provide a .then(callable) which stores those 
callables into a std::vector and its operator() executes the previously 
stored callables.

Now let us define a std::promise lookalike, with its main deficiencies 
being a total lack of exception_ptr support (one would use expected<T, E> 
as the transport type for that) and of course any thread safety:

template<class T> class promise : public 
basic_promise<detail::then_continuation<T>, T, 
detail::then_continuation<T>> ...

basic_promise takes three template parameters, the type of the front split 
of the continuations, the transported type between the two halves of 
continuations, and the type of the back split of the continuations. Here, 
we are defining promise as transporting a type T, and two then_continuation 
continuations exist before and after the transport of the type T.

To our future<T> lookalike we now add this:

  template<class U> auto then(launch policy, U &&c)
  {
    if(policy==launch::deferred)
      return Base::_after.then(std::forward<U>(c));
    else if(Base::_other)
      return Base::_other->_before.then(std::forward<U>(c));
  }

So, if the .then() dynamic continuation is deferred, push it into the 
latter then_continuation, else push it into the former.

And voila, you can now do this very similar to Concurrency TS code:

  promise<int> p;
  future<int> f(p.get_future());
  future<double> f2(f.then(launch(), [](int a){return a+1.0;}));
  p.set_value(5);
  return f2.get();

And you get back 6.0, as expected.

However here is where things get neato. How many instructions do you think 
GCC 4.9 generates for this sequence?

  promise<int> p;
  future<int> f(p.get_future());
  p.set_value(5);
  return f.get();

In traditional future-promise, that is an operation perhaps consuming 
thousands of CPU cycles and at least one memory allocation and 
deallocation. Well, this is the assembler produced by the above by GCC 4.9:

_Z19test_futurepromise4v:
	movq	$0, -88(%rsp)
	movq	$0, -80(%rsp)
	movl	$5, %eax
	movq	$0, -72(%rsp)
	movq	$0, -48(%rsp)
	movq	$0, -40(%rsp)
	movq	$0, -32(%rsp)
	movl	$5, -56(%rsp)
	movq	$0, -64(%rsp)
	movq	$0, -24(%rsp)
	ret

Believe it or not, this is actually bad output - there is a bug in GCC's 
optimiser and it should in fact be outputting a single instruction rather 
than writing zeros into the stack for no purpose - it seems to get confused 
by the double declation of a std::vector which is never used, because 
having just one std::vector produces the ideal output but having two 
produces the above. However clang 3.4's optimiser does far worse, and a bug 
there with how non-immediately defined types are treated causes complete 
expansion of code in its entirety, even if you remove all std::vector's 
(supplying any type not immediately declared in the same function breaks 
the optimiser for some unknown reason. I haven't tried clang 3.5 yet, it 
may be fixed).

Other very interesting policy continuations which can be statically mixed 
into custom future promise types include calling some application specific 
callback without needing to indirect via a pointer to a function. This lets 
you combine future promise with invoking callbacks, useful for legacy code. 
The idea behind this suite of future-promise fundamental parts is that they 
can be mixed at will into bespoke future-ish promise-ish application 
specific future promise types. Because they all share the same foundations, 
a when_any() or when_all() can accept heterogenous future types - and I 
don't mean future<int> with future<double>, rather I mean afio::future<> 
with qt::future<int>, that sort of heterogenity.

Another very interesting idea is combining these with signals and slots, so 
via custom continuations you can keep slots and fire signals in sync with 
future promises. This design I think shows a lot of potential.

Anyway, I feel very happy with this design, and something along these lines 
is where I shall proceed next. The eventual intention is for these future 
promises to become the default Boost.Thread future promises when C++11 is 
available, though this will be up to a year from now. I won't be able to 
support future-promise custom allocators, but I certainly should be able to 
completely avoid memory allocation and sometimes the generation of any code 
at all with this design.

The demonstration prototype code is at 
https://github.com/ned14/boost.spinlock/blob/expected_future/test_constexpr_
future_promise2.cpp for anyone interested. It needs a C++ 14 compiler, so a 
minimum of libstdc++ 4.9 and either GCC 4.9 or clang 3.4 or better. 
Comments are welcome.

Niall

Niall Douglas

tags

participants (1)