On 19 Mar 2015 11:09, "Niall Douglas"
On 17 Mar 2015 at 21:42, Giovanni Piero Deretta wrote:.
* The wait strategy is not part of the future implementation (although a default is provided). future::get, future::wait, wait_all and wait_any are parametrized by the wait strategy.
* The design aims to streamline and make as fast as possible future and promise at the cost of a slower shared_future (although there is room for improvement).
Your future still allocates memory, and is therefore costing about 1000 CPU cycles.
1000 clock cycles seems excessive with a good malloc implementation. Anyways, the plan is to add support to a custom allocator. I do not think you can realistically have a non allocating future *in the general case* ( you might optimise some cases of course).
That is not "as fast as possible". The use of std::atomic also prevents the compiler optimiser from eliding the future implementation entirely, because atomic always forces code to be generated. As much as futures today are big heavy things, tomorrow's C++17 futures especially resumable function integrated ones must be completely elidable, then the compiler can completely eliminate and/or collapse a resumable function call or sequence of such calls where appropriate. If you have an atomic in there, it has no choice but to do the atomic calls needlessly.
I understand what you are aiming at, but I think that the elidability is orthogonal. Right now I'm focusing on making the actual synchronisation fast and composable in the scenario where the program has committed to make a computation async.
I think memory allocation is unavoidable for shared_future, or at least any realistically close to the STL implementation of one. But a future I think can be both STL compliant and never allocate memory and be optimisable out of existence.
* The wait strategy only deals with 'event' objects, which act as a bridge between the future and the promise.
The event object is really my the core of my contribution; it can be thought of as the essence of future<void>::then; alternatively it can be seen as a pure user space synchronization primitive.
Exactly as my C11 permit object is. Except mine allows C code and C++ code to interoperate and compose waits together.
Not at all. I admit not having studied permit in detail (the doc size is pretty daunting) but as far as I can tell the waiting thread will block in the kernel. It provides a variety of ways on how to block, the user can't add more. With event the decision of how to block ( or even whether to block at all) is done by the consumer and most importantly it can be delayed till the last moment. You can think of event as the bridge between the signal side and wait side of permit, or alternatively as a future<void> which only supports then (no get or wait). Regarding interfacing with a C event source , it can be done trivially right now now as long as it provides a callback (function pointer + context pointer) API, which is very common already. No need to wait for existing libraries to embrace a new synchronisation object (cue xkcd vignette about standards :) ) Allowing C to wait for events would require replacing the waiter vtable with the c equivalent, but is not something I care much right now.
Other features of this implementation:
* Other than in the blocking strategy itself, the future and promise implementation have no sources of blocking (no mutexes, not even spin locks).
* The shared state is not reference counted.
To demonstrate the generality of the waiting concept, I have implemented a few waiter objects.
* cv_waiter: this is simply a waiter on top of an std::mutex + cond
var.
* futex_waiter (linux): an atomic counter + a futex, possibly more efficient than the cv_waiter
* sem_waiter (posix): a waiter implemented on top of a posix semaphore. More portable than the futex waiter and possibly more efficient than the cv_waiter
* fd_waiter(linux, possibly posix): a waiter implemented on top of linux eventfd (for portability it can also be implemented on top of a pipe); you can use select with futures!
* task_waiter: a completely userspace based coroutine waiter which switches to another coroutine on wait and resumes the original coroutine on signal.
* scheduler_waiter: another coroutine based waiter, but on top of an userspace task scheduler. On wait switch to the next ready task, on signal enqueue the waiting task to the back of the ready queue.
I know that there is a plan to reimplement a new version of boost::threads, hopefully this implementation can contribute a few ideas.
My current best understanding of Vicente's plans is that each thread has a thread local condvar. The sole cause of a thread sleeping, apart from i/o, is on that thread local condvar. One therefore has a runtime which keeps a registry of all thread local condvars, and can therefore deduce the correct thread local condvars to wake when implementing a unified wait system also compatible with Fibers/resumable functions.
That doesn't work if a program wants to block for example in select or spin on a memory location, or on an hardware register, or wait for a signal, or interoperate with a different userspace thread library, or some other event queue (asio, qt or whatever) etc. and still also wait for a future. Well, you can use future::then, but it has overhead. -- gpd