
On 31 Oct 2013 at 15:23, Gavin Lambert wrote:
surprising though as the times I was seeing were in the order of 300ms from requesting the lock to being granted it, as I said before, which is a bit excessive for even a kernel wait. (And before you ask, the
I've seen CAS locks spike to a quarter second if you get a very unlucky sequence of events where all cores are read modify writing more cache lines that the cache coherency bus can cope with. You'll see the mouse pointer, disc i/o etc all go to ~4Hz. Admittedly, that's a problem older processors experience more than newer ones, Intel have improved things.
ASIO itself or in the small amount of wrapper code I had to rewrite when moving from ASIO to my custom implementation, because it seems to have gone away since switching over. (The access pattern of the outside code is unchanged.)
ASIO may be doing nothing wrong, but simply the combination of your code with its code produces weird timing resonances which just happen to cause spikes on some particular hardware. I occasionally get bug reports for nedmalloc by hedge funds where they upgraded to some new hardware and nedmalloc suddenly starts latency spiking. I tell them to add an empty for loop incrementing an atomic, and they're often quite surprised when the spiking goes away.
Mmm, I was just about to suggest that nedmalloc might be doing a free space consolidation run and that might be the cause of the spike, but if it isn't then okay.
Not unless it can do that without locking anything, at least. I was basically only recording attempts to lock/unlock rather than any access to the allocator.
nedmalloc keeps multiple pools, and while free space consolidating one pool it will send traffic to one of the other pools.
I suspect I'm hitting the memory allocator in my implementation more frequently than ASIO was, actually -- I'm not trying to cache and reuse operations or buffers; it just does a "new" whenever it needs it. (Although I might be getting away with fewer intermediate objects, since I've cut the functionality to the bare minimum.) So I doubt allocation was the issue. (Unless maybe it was trying to *avoid* allocation that introduced the issue, as the post that started this discussion implied.)
One of the cunning ideas I had while at BlackBerry was for a new clang optimiser pass plugin which has the compiler coalesce operator new calls into batch mallocs and replace all sequences of stack unwound new/deletes with alloca(). It would break ABI compatibility with GCC, but I reckoned would deliver tremendous performance improvements in malloc contended code. Shame we probably won't see that optimisation any time soon, it would help Boost code in particular. Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/