[pool] Thread specific pool allocator

I develop code to run on multiple processor machines. Moderate use of the standard template library (STL) containers cause programs to slow down when the work is split between threads running on different processors. This is because the STL default memory allocator is a thread-safe singleton and causes contention between the threads. The memory allocator provided by boost::pool_allocator is also a thread safe singleton. I have coded a thread specific memory allocator based on boost::pool and boost:thread_specific_pointer. My allocator creates a new instantiation for each thread it is used in, and so avoids contention between threads. The containers created with this allocator must be used carefully, not written to by more than one thread, because the allocator is not thread-safe ( that is the point! ) - however they provide dramatic performance improvements when my code runs on multiple processors. I do not expect my coding is to the high standard of the boost libraries, but I do believe it shows that something very useful can be done with a few lines of code. Any chance of a boost:thread_specific_pool_allocator in the near future? James Bremner

On Sat, Sep 12, 2009 at 10:01 AM, James Bremner <ravenspoint@yahoo.com> wrote:
I develop code to run on multiple processor machines. Moderate use of the standard template library (STL) containers cause programs to slow down when the work is split between threads running on different processors. This is because the STL default memory allocator is a thread-safe singleton and causes contention between the threads.
The memory allocator provided by boost::pool_allocator is also a thread safe singleton.
I have coded a thread specific memory allocator based on boost::pool and boost:thread_specific_pointer. My allocator creates a new instantiation for each thread it is used in, and so avoids contention between threads. The containers created with this allocator must be used carefully, not written to by more than one thread, because the allocator is not thread-safe ( that is the point! ) - however they provide dramatic performance improvements when my code runs on multiple processors.
I do not expect my coding is to the high standard of the boost libraries, but I do believe it shows that something very useful can be done with a few lines of code. Any chance of a boost:thread_specific_pool_allocator in the near future?
As I recall, Boost already have such a thread-specific allocator for things that Boost.Pool could use. If all your objects are the same size, Boost.Pool is great. If you want to allocate things of many different sizes, then I recommend tcmalloc from google ( http://code.google.com/p/google-perftools/ there are other things in that library too, but the memory allocator by itself is awesome). tcmalloc is basically like a lot of variable sized boost.pools allocated within each other and so forth, wonderful speed, I know of nothing that beats it. It is actually a rather simple design and I had something similar already made that I used a few years back (built off of boost.pool, but due to some of boost.pool's design it does not quite reach the same speed).

As I recall, Boost already have such a thread-specific allocator for things that Boost.Pool could use.
I could not find this. If it exists, please provide a link or other reference.
If you want to allocate things of many different sizes ...
All my objects are the same size - that is why I am using a pool allocator.
I recommend tcmalloc from google ( http://code.google.com/p/google-perftools/
I looked at this. IMHO it appears rather amateurish. James

I would also look at Intel's TBB and compare performance. You may find that it is a drop-in replacement that is both faster and more general. I wrote a fast allocator some time ago, using regions and a no-op dealloc, and thread-local storage as required. Although it was indeed faster than TBB, it was less general and the difference was only 1.6x faster on MSVC. In summary, before making any suggestions about allocation schemes, I recommend doing some benchmarking and providing results first. If you are interested you may like to get some ideas from my memory allocation benchmarking code *http://tinyurl.com/qvyxvu* and *http://tinyurl.com/l89llq .* Regards, Christian On Sun, Sep 13, 2009 at 7:28 AM, James Bremner <ravenspoint@yahoo.com>wrote:
As I recall, Boost already have such a thread-specific allocator for things that Boost.Pool could use.
I could not find this. If it exists, please provide a link or other reference.
If you want to allocate things of many different sizes ...
All my objects are the same size - that is why I am using a pool allocator.
I recommend tcmalloc from google ( http://code.google.com/p/google-perftools/
I looked at this. IMHO it appears rather amateurish.
James
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Christian Schladetsch" <christian.schladetsch@gmail.com> wrote
I would also look at Intel's TBB and compare performance. You may find that it is a drop-in replacement that is both faster and more general.
I have heard good things about Intel's TBB. I may take a detailed look at it. I am resisting doing so because: 1. I already use the BOOST libraries. It is a pain to require my clients to install yet another library. 2. The BOOST licence is a lot more "free" than the GPL used by TBB. I am a little surprised by the responses to my post. I believe I have identified a significant gap in the BOOST offerings which could be closed with a few lines of code. Since this is a BOOST developer's newsgroup I did not expect members to suggest using other libraries. James

James Bremner wrote:
"Christian Schladetsch" <christian.schladetsch@gmail.com> wrote
2. The BOOST licence is a lot more "free" than the GPL used by TBB.
I am a little surprised by the responses to my post. I believe I have identified a significant gap in the BOOST offerings which could be closed with a few lines of code. Since this is a BOOST developer's newsgroup I did not expect members to suggest using other libraries.
Boost isn't so shortsighted as to think non-Boost is bad. Licensing is often a good reason to put something in Boost, however. The real driver is to advance the state of the art in C++ libraries, always with an eye to inclusion in the Standard. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

"Stewart, Robert" <Robert.Stewart@sig.com> wrote in message
Boost isn't so shortsighted as to think non-Boost is bad. Licensing is often a good reason to put something in Boost, however. The real driver is to advance the state of the art in C++ libraries, always with an eye to inclusion in the Standard.
There is a universe of excellent libraries. However, I did not expect to see them resommended on this newsgroup. IMHO the standard libraries are rather perverse when used for programs to be run on multi-processor machines - which many people own today. Naive use of the STL containers, in particular, causes a multithreaded program to run slower on multi-processor machines. I would like to see BOOST begin to address this issue, hence my post. James

On Mon, 14 Sep 2009 12:10:45 -0400, "James Bremner" <ravenspoint@yahoo.com> wrote:
There is a universe of excellent libraries. However, I did not expect to
see them resommended on this newsgroup.
IMHO the standard libraries are rather perverse when used for programs to be run on multi-processor machines - which many people own today. Naive use of the STL containers, in particular, causes a multithreaded program to run slower on multi-processor machines.
Have you tried plugging the HOARD memory allocator into your program?
I would like to see BOOST begin to address this issue, hence my post.
Multicore aware programming is more than a simple issue, and certainly not solved just by an efficient allocator (although this doesn't hurt ;) ). There is a task library under development that should help a bit regarding efficient multi-threaded programming. I don't know what it's current status is. When you have an efficient task scheduler, writing parallel version of STL algorithms is easier. Last but not least, you have Boost.STM: http://eces.colorado.edu/~gottschl/dracoSTM/index.html -- EA

"Edouard A." <edouard@fausse.info> wrote in message
Have you tried plugging the HOARD memory allocator into your program?
Yes. This provided a significant performance improvement, however it has some serious drawbacks a.. I experienced some crashes during testing b.. Commercial licensing is expensive c.. It 'hooks' system calls to malloc, a technique I consider dodgy. d.. It is a DLL, which I avoid since they result in endless configuration management nightmares.
Multicore aware programming is more than a simple issue, and certainly not solved just by an efficient allocator (although this doesn't hurt ;) ).
Gosh! Did I suggest anything like this? However, a well designed multithreaded program will slow down, rather than speed up, on a multi-processor machine when the standard memory allocator is used which took me, at least, completely by surprise and mystified me for several days. James

On Mon, 14 Sep 2009 12:47:00 -0400, "James Bremner" <ravenspoint@yahoo.com> wrote:
Yes. This provided a significant performance improvement, however it has
some serious drawbacks
a.. I experienced some crashes during testing b.. Commercial licensing is expensive c.. It 'hooks' system calls to malloc, a technique I consider dodgy. d.. It is a DLL, which I avoid since they result in endless configuration management nightmares.
I completely share your views on Hoard. I don't like the hook placed on the allocator and it's just too much expensive for an allocator that makes program crash. Have you had a look at jemalloc? It's the allocator from FreeBSD. That's what they use for Firefox. It needs some adjustments, I guess, to be transformed into a full fledged STL allocator but it's a nice start. http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c
Gosh! Did I suggest anything like this? However, a well designed multithreaded program will slow down, rather than speed up, on a multi-processor machine when the standard memory allocator is used which took me, at least, completely by surprise and mystified me for several days.
Sorry I misunderstood your mail. The main reason you get such a speed boost with a better allocator is because it can prevent (or at least reduce) false sharing and is able to allocate in different threads simultaneously. If you want to go further, go for concurrent/lockfree containers. -- EA

Am Monday 14 September 2009 18:57:29 schrieb Edouard A.:
On Mon, 14 Sep 2009 12:47:00 -0400, "James Bremner" <ravenspoint@yahoo.com>
wrote:
Yes. This provided a significant performance improvement, however it has
some serious drawbacks
a.. I experienced some crashes during testing b.. Commercial licensing is expensive c.. It 'hooks' system calls to malloc, a technique I consider dodgy. d.. It is a DLL, which I avoid since they result in endless
configuration
management nightmares.
I completely share your views on Hoard. I don't like the hook placed on the allocator and it's just too much expensive for an allocator that makes program crash.
Have you had a look at jemalloc? It's the allocator from FreeBSD. That's what they use for Firefox. It needs some adjustments, I guess, to be transformed into a full fledged STL allocator but it's a nice start.
http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c
Gosh! Did I suggest anything like this? However, a well designed multithreaded program will slow down, rather than speed up, on a multi-processor machine when the standard memory allocator is used which took me, at least, completely by surprise and mystified me for several days.
Sorry I misunderstood your mail.
The main reason you get such a speed boost with a better allocator is because it can prevent (or at least reduce) false sharing and is able to allocate in different threads simultaneously.
this suggestion is very much depending on your use case, but boost::intrusive containers can be very helpful to avoid allocation at inappropriate points completely. even if you need allocation at container insert, you can pool the allocation and put the unused nodes in an intrusive::slist in between. that's very much like a pool allocator but avoids the problem of STL containers requiring a stateless allocator and thus avoids using a thread_specific_ptr, which also is quite expensive on some platforms. boost::intrusive containers never allocate memory themselves.

2. The BOOST licence is a lot more "free" than the GPL used by TBB.
ianal, but tbb uses gpl2 with runtime exception, with similar license implications like gnu's libstdc++ tim -- tim@klingt.org http://tim.klingt.org Every word is like an unnecessary stain on silence and nothingness Samuel Beckett
participants (7)
-
Christian Schladetsch
-
Edouard A.
-
James Bremner
-
OvermindDL1
-
Stefan Strasser
-
Stewart, Robert
-
Tim Blechmann