
On 26/10/2013 11:09, Quoth Niall Douglas:
lockfree::queue isn't actually hugely performant. Most lock free code isn't compared to most lock based implementations because you gain in worst case execution times by sacrificing average case execution times. The only major exception is lockfree::spsc_queue which is indeed very fast by any metric.
Yes, I know that. As I said my implementation only barely uses the queue anyway. (I was thinking about using an MPSC queue for the strand implementation, since it's the heaviest queue user, MPSC is all it should require, and I do have one handy -- but so far that's not my bottleneck so I haven't worried about it too much.) I still generally find that being able to complete work without context switching is a massive win over running into a lock-wall, even if the individual work takes longer on average to complete.
ASIO is, once you compile it with optimisation, really a thin wrapper doing a lot of mallocs and frees around Win IO completion ports. Any latency spikes are surely due to either IOCP or the memory allocator causing a critical section to exceed its spin count, and therefore go to kernel sleep?
No, the latency was very definitely coming from *some* instance of either boost::asio::detail::mutex or boost::asio::detail::static_mutex. I didn't trace it down any further than that. (As far as the memory allocator goes, I'm actually using nedmalloc -- which I know isn't lock-free but it's pretty decent at avoiding lock contention. And I had that instrumented too and there were no memory allocation locks around the time when the latency occurred.) My application was using an io_service with 6-8 threads, and a large number of serial_ports each with their own io_strand and deadline_timer. The deadline_timer is cancelled and restarted most of the time rather than being allowed to expire -- but this shouldn't be unusual behaviour for a timeout.