On 01/16/2014 08:04 AM, Gavin Lambert wrote:
On 16/01/2014 07:32, Quoth Alexander Krizhanovsky:
Otherwise I'm wondering if it has sense to integrate the queue implementation into boost::lockfree?
The algorithm description is available at http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-...
I've only had a quick glance at it, but it looks less general-purpose -- it appears that for each queue instance, you need to predefine an upper bound on the number of producer and consumer threads and assign each thread a unique id within that bound.
(Also, given the apparent locality of the thread-id, I don't like that it is stored and accessed as a thread-global; this would complicate use of multiple queues from one thread.
The queue requires thread local pointers to head and tail (the same approach is used in LMAX Disruptor). Since each thread can have access to many queues, then the local pointers are implemented as an array of ThrPos structures and the array is a member of particular queue instance. However, to quickly access the queue we need small consequent thread identifiers. For example if we have two threads with identifiers 0 and 1 and two queues Q0 and Q1, then the thread will access their local pointers through Q0.thr_p_[0] and Q0.thr_p_[1] for first queue and Q1.thr_p_[0] and Q1.thr_p_[1] for the second queue. This can be continued to arbitrary number of threads and queues. We only need to know on the queue construction how many threads will be using it.
And interleaving the per-thread head/tail indexes seems vulnerable to false sharing.)
False sharing is bad due to costly requests for ownership message of cache coherence protocol. tail and head pointers are modified only by the owning thread which is preferably should be binded to particular CPU (however, OS scheduler doesn't reschedule a thread to other CPU without need). Other threads/CPUs are only reads tail/head pointers of the thread. There is sense to align ThrPos to cache line, so different CPUs won't update the same cache lines. I tried this, but didn't get intelligible performance improvement on my laptop (2 cores), 12 cores/2 processors and 40 cores/4 processors servers. Probably, there is sense to make more benchmarks....
There are still many cases where manually assigning thread ids is achievable, so something like this might be useful to have.
Yes, I understand that current thread ids are bit ugly and it's good to replace them by nicer interfaces. It seems boost::thread and std::thread don't provide small consecuent thread ids... Any ideas how it's possible to adjust the interface?
But it couldn't be a replacement for the more general queue.
Yes, definitely agree. The ring buffer queue with all its limitations was designed for classical work queue, not as a general fast queue.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Alexander Krizhanovsky NatSys Lab. (http://natsys-lab.com) tel: +7 (916) 717-3899, +7 (499) 747-6304 e-mail: ak@natsys-lab.com