Re: [boost] Non-allocating future promise... Re: ASIO into the standard (was: Re: C++ committee meeting report)

8 Jul 2014

      On 7 Jul 2014 at 1:50, Gottlob Frege wrote:
...
https://github.com/boostcon/cppnow_presentations_2013/blob/master/mon/future...
2013.  The talk is called "Non-Allocating std::future/promise".  I
think most of it is about a... non-allocating future-promise.
Hopefully. That was at least the idea.
Dear dear dear ... given that I was working at BlackBerry with you at 
the time, and went with you to C++ Now, I really don't know how I 
missed this.

I'm going to assume that I didn't miss this and instead choose to 
forget and then pretend that your idea was my idea. So, my apologies, 
and I'll credit you in the docs when the time comes.
...
In brief:
- each has a pointer to the other
- the result is stored in the future
- if one moves or is destroyed, it is... tricky. But basically a CAS
and a small spin lock in a rare case when you need to wait for the
other side to update the pointer.
The problem with this is it isn't space optimal. Space optimal means 
storing the spinlock in the low bits of the memory pointer instead of 
wasting another four or eight bytes on it, which is what I've done. 
Performance is indeed impacted by a bit, but not that much - after 
all it's all the same cache line anyway.

I also didn't see lock backoff in your example code i.e. where both 
the promise and future are concurrently moved from different threads. 
Did I miss something?
...
Yes, replacing the spin and pointer updating with TM would be nice.
And here is where things become very interesting.

I started off assuming, as you would, that TM has two big wins:

1. You can update two or more disparate pointers atomically. Very 
handy for lock and wait free removal of items from linked lists.

2. You can skip locking where concurrent updating isn't a problem. 
How this works is you take care to volatile read from all those cache 
lines which if written to without locks would corrupt state, then you 
write as little as possible yourself.

Where I have made a mistake is in assuming that TM is free. It is 
not: Intel TSX HLE is significantly slower for the single threaded 
case, while TSX RTM is not free to configure. For a simple two 
pointer update, using RTM is slower than a spinlock for example. If 
you compile with transactional GCC, all your code becomes 40% slower 
so you don't win in performance there either.

Which suggests that TSX really needs a bigger chunk of transaction to 
make the overhead worth it, and hence why I thought I should write a 
TSX concurrent_unordered_map which, as I learned last night, runs at 
about 40% the single threaded performance of a spin locked 
unordered_map which is unacceptable. A split ordered list 
implementation using cmpxchg is twice quicker, but has the cost that 
erase is unsafe.

So back to the drawing board again. I'm now thinking of simplifying 
by requiring that the mapped type is a shared_ptr<T> and I'll see 
what sort of design that might yield. I am finding that using TM is 
repeatedly not worth it so far due to the costs on single theaded 
performance, far more frequently than I expected. Maybe the next 
Intel chip will improve TSX's overheads substantially.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] Non-allocating future promise... Re: ASIO into the standard (was: Re: C++ committee meeting report)

Niall Douglas