On 25 Oct 2013 at 19:39, Gavin Lambert wrote:
upside most of the guts are entirely lock-free (though not wait-free, since it's based on Boost.LockFree's queue).
lockfree::queue isn't actually hugely performant. Most lock free code isn't compared to most lock based implementations because you gain in worst case execution times by sacrificing average case execution times. The only major exception is lockfree::spsc_queue which is indeed very fast by any metric.
Asio at Windows uses IOCP (default settings, using asio::io_service for task scheduling) and that is the (theoretical) reason of better thread scheduling for the Asio-based thread pool. Sometimes it's really visible.
It's also full of mutexes though, which is why it didn't work out for me. (Note that I was using Boost 1.53 when testing Asio; maybe this has changed in future versions, although I heard that 1.54 picked up a bug in the IOCP reactor.)
I saw lost wakeups during parallel writes in ASIO 1.54, so I disabled those for AFIO. That appears to be fixed in 1.55, so AFIO now parallelises everything as it was designed to. This might mean ASIO in 1.55 is fixed.
I'm not sure exactly which lock triggered the slow path (my logging was only sufficient to show that it was one of the ones inside Asio, but not which one). But as the prior email said, given reuse of strand implementations between supposedly independent strands, that seems like a likely candidate. (Though it didn't take long for the latency spikes to manifest -- typically they'd start after a couple of minutes and then recur roughly every 10-30 seconds.)
ASIO is, once you compile it with optimisation, really a thin wrapper doing a lot of mallocs and frees around Win IO completion ports. Any latency spikes are surely due to either IOCP or the memory allocator causing a critical section to exceed its spin count, and therefore go to kernel sleep?
I haven't done a head-to-head benchmark on each (and it wouldn't surprise me if Asio were faster than mine for many loads -- and it's definitely more flexible than I made mine) but so far my one is doing at least as well as Asio on production loads but without the latency spikes from the locks. Still very early days yet though.
If you're on Haswell, you might look into my memory transaction implementation in AFIO. It uses Intel TSX if available according to runtime detection, otherwise it falls back onto a policy composed spin lock (yes I know I did NIH with yet another Boost spinlock implementation, but hey mine is policy composed so you can vary spin counts etc!!!). It works on Intel's TSX simulator, but I would really love to know if it works on real TSX hardware. Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/