Re: [boost] [lockfree] Faster ring-buffer queue implementation

16 Jan 2014

      On 01/16/2014 08:04 AM, Gavin Lambert wrote:
...
On 16/01/2014 07:32, Quoth Alexander Krizhanovsky:
...
Otherwise I'm wondering if it has sense to integrate the queue
implementation into boost::lockfree?
The algorithm description is available at
http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-...
I've only had a quick glance at it, but it looks less general-purpose 
-- it appears that for each queue instance, you need to predefine an 
upper bound on the number of producer and consumer threads and assign 
each thread a unique id within that bound.
(Also, given the apparent locality of the thread-id, I don't like that 
it is stored and accessed as a thread-global; this would complicate 
use of multiple queues from one thread.
The queue requires thread local pointers to head and tail (the same 
approach is used in LMAX Disruptor). Since each thread can have access 
to many queues, then the local pointers are implemented as an array of 
ThrPos structures and the array is a member of particular queue 
instance. However, to quickly access the queue we need small consequent 
thread identifiers.

For example if we have two threads with identifiers 0 and 1 and two 
queues Q0 and Q1, then the thread will access their local pointers 
through Q0.thr_p_[0] and Q0.thr_p_[1] for first queue and Q1.thr_p_[0] 
and Q1.thr_p_[1] for the second queue. This can be continued to 
arbitrary number of threads and queues. We only need to know on the 
queue construction how many threads will be using it.
...
And interleaving the per-thread head/tail indexes seems vulnerable to 
false sharing.)
False sharing is bad due to costly requests for ownership message of 
cache coherence protocol. tail and head pointers are modified only by 
the owning thread which is preferably should be binded to particular CPU 
(however, OS scheduler doesn't reschedule a thread to other CPU without 
need). Other threads/CPUs are only reads tail/head pointers of the thread.

There is sense to align ThrPos to cache line, so different CPUs won't 
update the same cache lines. I tried this, but didn't get intelligible 
performance improvement on my laptop (2 cores), 12 cores/2 processors 
and 40 cores/4 processors servers. Probably, there is sense to make more 
benchmarks....
...
There are still many cases where manually assigning thread ids is 
achievable, so something like this might be useful to have.
Yes, I understand that current thread ids are bit ugly and it's good to 
replace them by nicer interfaces. It seems boost::thread and std::thread 
don't provide small consecuent thread ids... Any ideas how it's possible 
to adjust the interface?
...
But it couldn't be a replacement for the more general queue.
Yes, definitely agree. The ring buffer queue with all its limitations 
was designed for classical work queue, not as a general fast queue.
...
_______________________________________________
Unsubscribe & other changes: 
http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Alexander Krizhanovsky
NatSys Lab. (http://natsys-lab.com)
tel: +7 (916) 717-3899, +7 (499) 747-6304
e-mail: ak@natsys-lab.com