[quick_allocator] false sharing on SMP

Pavel Vozenilek asked to repost this here, so here it is: http://groups.google.com/group/comp.lang.c++.moderated/msg/2b85fdcd20239a52 I just skimmed through boost/detail/quick_allocator.hpp and noticed that quick_allocator causes false sharing on SMP. It happens when several counters are allocated within the same cache line and those counters are used by different processors, thus thrashing processors' cache lines when the counter is written, even when the counter is used by a single processor only. More information about allocators and false sharing can be found in papers included with Hoard allocator sources. http://hoard.org/ -- Maxim Yegorushkin

Maxim Yegorushkin wrote:
I just skimmed through boost/detail/quick_allocator.hpp and noticed that quick_allocator causes false sharing on SMP. It happens when several counters are allocated within the same cache line and those counters are used by different processors, thus thrashing processors' cache lines when the counter is written, even when the counter is used by a single processor only.
What do you suggest?

On Thu, 25 Aug 2005 17:34:59 +0400, Peter Dimov <pdimov@mmltd.net> wrote:
Maxim Yegorushkin wrote:
I just skimmed through boost/detail/quick_allocator.hpp and noticed that quick_allocator causes false sharing on SMP. It happens when several counters are allocated within the same cache line and those counters are used by different processors, thus thrashing processors' cache lines when the counter is written, even when the counter is used by a single processor only.
What do you suggest?
IMO, patching it to avoid false sharing may require too much effort. Not quite constructive, but I would stick to using standard new/delete and replaced libc provided malloc() with hoard's one for my project. -- Maxim Yegorushkin

Maxim Yegorushkin wrote:
On Thu, 25 Aug 2005 17:34:59 +0400, Peter Dimov <pdimov@mmltd.net> wrote:
Maxim Yegorushkin wrote:
I just skimmed through boost/detail/quick_allocator.hpp and noticed that quick_allocator causes false sharing on SMP. It happens when several counters are allocated within the same cache line and those counters are used by different processors, thus thrashing processors' cache lines when the counter is written, even when the counter is used by a single processor only.
What do you suggest?
IMO, patching it to avoid false sharing may require too much effort.
That, and it would increase the memory footprint.
Not quite constructive, but I would stick to using standard new/delete and replaced libc provided malloc() with hoard's one for my project.
That's always been the recommended course of action. quick_allocator is for (a) benchmarks that use a horribly slow underlying malloc, (b) people that try the #define and their particular project tends to benefit a lot, and (c) people that want to customize the allocations of their local copy of shared_ptr, but need a starting point. It's very much a "toy" compared to industrial grade malloc replacements, especially in MT mode, where you definitely want thread-specific (or at the very least, lock free) free lists. False sharing is rarely at the top of your worries on high contention SMP. :-)

[mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov Maxim Yegorushkin wrote:
On Thu, 25 Aug 2005 17:34:59 +0400, Peter Dimov <pdimov@mmltd.net> wrote:
Maxim Yegorushkin wrote:
I just skimmed through boost/detail/quick_allocator.hpp and noticed that quick_allocator causes false sharing on SMP. It happens when several counters are allocated within the same cache line and those counters are used by different processors, thus thrashing processors' cache lines when the counter is written, even when the counter is used by a single processor only.
What do you suggest?
IMO, patching it to avoid false sharing may require too much effort.
That, and it would increase the memory footprint.
Typically you avoid sharing by using thread-local storage (TLS). Thus you can't pool memory between threads, resulting in an increase in memory usage on average. Depending on the sophistication of the allocator, you could end up with resource leaks if you allocate in one thread and deallocate in another, but that is an edge case. But the good news is that if you just have one thread, you don't have extra overhead. The performance of locking operations (even interlocked increments/exchanges) will come to dominate the allocator on multi-cpu and even multi-core systems. So even a naïve pool allocator using TLS will scale reasonably well. I would quite like to see the memory model specified via a template argument to the pool allocator, as opposed to a macro. Just an instinctive aversion to macros... Regards, Calum
Not quite constructive, but I would stick to using standard new/delete and replaced libc provided malloc() with hoard's one for my project.
That's always been the recommended course of action. quick_allocator is for (a) benchmarks that use a horribly slow underlying malloc, (b) people that try the #define and their particular project tends to benefit a lot, and (c) people that want to customize the allocations of their local copy of shared_ptr, but need a starting point.
It's very much a "toy" compared to industrial grade malloc replacements, especially in MT mode, where you definitely want thread-specific (or at the very least, lock free) free lists. False sharing is rarely at the top of your worries on high contention SMP. :-)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/bo> ost

On Thu, 25 Aug 2005 18:29:37 +0400, Peter Dimov <pdimov@mmltd.net> wrote:
quick_allocator is for (a) benchmarks that use a horribly slow underlying malloc, (b) people that try the #define and their particular project tends to benefit a lot, and (c) people that want to customize the allocations of their local copy of shared_ptr, but need a starting point.
It's very much a "toy" compared to industrial grade malloc replacements,
Well, I did not know that. -- Maxim Yegorushkin
participants (3)
-
Calum Grant
-
Maxim Yegorushkin
-
Peter Dimov