On 12 Jan 2015 at 21:50, Giovanni Piero Deretta wrote:
I have been following the thread with interest, and I wanted to know more about your non-allocating future/promise pair. As far as I understand, your future and promise have a pointer to each other and they update the other side every time they are moved, right?
Exactly right.
My question is, as you need to do the remote update with an atomic operation (exchange in the best case), and you usually perform at least a few moves (when composing futures for example), wouldn't a fast allocator outperform this solution?
Firstly, I found a separate CAS lock each per future and promise is considerably faster than trying to be any more clever. When updating, you lock both objects with back off before the update. Secondly, no this approach is far faster than a fast allocator, at least on Intel. The reason why is because promises and futures are very, very rarely contended on the same cache line between threads, so the CAS locking and updating almost never spins or contends. It's pretty much full speed ahead. The problem with specialised allocators is that firstly Boost.Thread's futures don't support allocators with futures, and secondly even if they did as soon as you bring global memory effects into the picture, you constrain the compiler optimiser considerably. For example, make_ready_future() with the test code I wrote is implemented very naively as: promise<T> p; future<T> f(p.get_future()); p.set_value(v); return f; ... make_ready_future(5).get(); ... which the compiler collapses into movl 5, eax ret Any use of an allocator can't let the compiler do that for you because touching global memory means the compiler has to assume an unknown read. This doesn't mean a custom make_ready_future() couldn't produce an equally optimised outcome, but for me personally the ability of the compiler to collapse opcode output suggests a good design here. I would also assume that when allowed to collapse opcodes, the compiler can also do alias folding etc which the use of an allocator may inhibit.
A portable, universal kernel wait object is not really necessary for that.
I think a portable, universal C API kernel wait object is very necessary if C++ is to style itself as a first tier systems programming language.
For what is worth I'm working on a proof-of-concept future/promise pair that is wait strategy agnostic. The only function that needs to know about the wait strategy are the future::wait{,_for,_untill,_any,_all} family and of course future::get, in case it needs to call wait. In fact the wait functions are parametrized on the wait strategy (be it a futex, condition variable, posix fd, posix semaphore, coroutine yield, etc) and the wait object can be stack allocated.
If I get everything right, all other functions, in particular promise::set_value and future::then should be lock-free (or wait free, depending on the underlying hardware).
The shared state should also have a nice minimal API.
The idea is fairly obvious in retrospect, I hope to be able to share some code soon.
I look forward to seeing some test code! Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/