[pool] Calling all library designers: The future of Boost.Pool???

As one of the last people to try and do any maintenance of Boost.Pool I feel I should try and start some discussion here on it's future. First off some random thoughts based on my last experiences with this library: * It should be simple, but it's confusing. In particular the difference between ordered and unordered pools is not particularly intuitive, but IMO the distinction between the various components, and what should be used where is also poor. * I believe the decision to support both ordered and unordered pools in the same interface was a mistake. At the very least it's potentially error prone in use, but it's also inefficient. It looks to me that this, along with the allocators provided, were a classic case of feature creep post-review. I'm sure they sounded like good ideas during the review, but in practice a smaller more tightly focused library might actually be better. * It's not clear that the claimed performance improvements are actually there in practice (any more). This is partly due to compiler and std lib improvements, partly due to feature creep in Boost.Pool. Some bug fixes have had a negative effect too - however needed they may have been for correctness. * Many of the existing bug reports were deliberately left "in limbo" because there was no obvious/easy way to fix them without compromising Boost.Pool's core mission (providing fixed size blocks of memory really fast). * There have been many changes to the standard, compiler technology, and to Boost's best practices since the library was designed, many of which would result in a different design today, but also different requirements today (per instance allocators in containers for example). * In no way do I want to be lumbered with maintaining Boost.Pool and in any case I don't believe it's really fit for purpose any more [1]. So..... in an ideal world, I'd like someone to step up to the plate and design it's replacement.... yes a whole new Pool2. Off the top of my head, here's my wish list: * There should be a very small, very focused (i.e. fast) core that provides fixed sized blocks of memory. Nothing more. * Thread synchronization when needed (template param?) should be via lockfree programming wherever possible. Indeed a simple pool like this, is pretty much the poster child for lock-free-programming. And of course we have Boost.Lockfree now. * There should probably be a heap implementation for variable sized allocation requests. I'm assuming that this can't be done lockfree(?). * Allocators: both singleton and stateful allocators should be provided (templated on pool type?). But there's a trick here: many containers only allocate fixed size blocks, but we have no way of knowing what that size is upfront at compile time. Connecting the allocation request to the correct pool interface in an efficient manner is pretty tricky here, especially if 99% of allocations are of fixed sized, but there are a few odd-sized control blocks also allocated. In any case, all allocations should come from the same underlying block of memory, even if they're for different types/sizes. * Both pools and allocators, should be able to accept a single fixed sized block of memory (from wherever) and divide it up to clients. A good example for this use case is an arena-allocator: a stateful allocator whose memory all comes from the stack, no thread synchronization required. Containers using such types can be blisteringly fast compared to regular allocators, particularly when using a scoped container for just a few manipulations. Ideal for use with C++11 containers or Boost.Container. Hopefully others will chime in here with their requirements as well. Cheers all, John. 1) This doesn't preclude applying *trivial* fixes to Boost.Pool, but I genuinely believe that big changes are more likely to be counterproductive at this point.

From: John Maddock ...
In no way do I want to be lumbered with maintaining Boost.Pool and in any case I don't believe it's really fit for purpose any more [1]. So..... in an ideal world, I'd like someone to step up to the plate and design it's replacement.... yes a whole new Pool2. ... This doesn't preclude applying *trivial* fixes to Boost.Pool, but I genuinely believe that big changes are more likely to be counterproductive at this point.
Pool has no maintainer. I've never heard anything good about the library. I would never use it and I know of several better (free) alternatives. How about we retire Pool from boost entirely and encourage anyone who desires to submit new libraries in this space with whatever names they see fit. I think Boost.Allocator sounds good. If people are still using Pool they can maintain it themselves. In practice, they already are, they just haven't been told that is the expectation. That way they won't be frustrated by their patches not getting applied. Regards, Luke

El 17/07/2012 20:35, John Maddock escribió:
As one of the last people to try and do any maintenance of Boost.Pool I feel I should try and start some discussion here on it's future.
I think there are some unexplored and useful pool techniques that could fit in Boost.Pool. Some experiments I did on this topic: http://www.drivehq.com/web/igaztanaga/allocplus/#Chapter3

On Tuesday 17 July 2012 19:35:40 John Maddock wrote:
Hopefully others will chime in here with their requirements as well.
I think it's important to understand the aim and scope of the library first. With current system allocators (such as tcmalloc, which uses per-thread pools internally, AFAIK), is there a real use for explicit memory pools? Why STL components such as vector and string (which typically don't release memory until destroyed) and an efficient system allocator are not enough? I can hardly imagine a use-case for a pool allocator to be used with a container. I evaluated Boost.Pool some (long) time ago as a way to optimize performance for a few of my applications but it appeared even slower than the system allocator, let alone specialized solutions. Frankly, I don't see much point in pooling raw memory nowdays, except to achieve more control on when the memory is released. A per-container memory pool with no locking at all might also squeeze some performance, but considering per-thread memory pools in the system allocator this gain remains to be proved. A portable aligned memory allocator perhaps? Useful, but that's an item for Boost.Allocator and not Boost.Pool. My point is that the scope of the library is very small and the benefits it may provide are not evident. The only real application of this library I'm interested in is a container-local memory pool with no locking at all to achieve better performance. Maybe others can provide real use cases for memory pooling, and after that we can compile requirements. There is also an option to repurpose the library. Pooling raw memory may not be that needed today but pooling constructed objects is another story. More than once I had cases when constructing an object, including collateral resource allocation and initialization, is a costly operation and it is only natural to arrange a pool of such objects. While in the pool, the objects are not destroyed but merely "cleaned". An object can be retrieved and returned to pool multiple times, which saves the costly initialization. I could use a framework for building such pools. I think this is the most productive way of development of the library.

Le 18/07/2012 00:36, Andrey Semashev a écrit :
I evaluated Boost.Pool some (long) time ago as a way to optimize performance for a few of my applications but it appeared even slower than the system allocator, let alone specialized solutions. Frankly, I don't see much point in pooling raw memory nowdays, except to achieve more control on when the memory is released. A per-container memory pool with no locking at all might also squeeze some performance, but considering per-thread memory pools in the system allocator this gain remains to be proved. A portable aligned memory allocator perhaps? Useful, but that's an item for Boost.Allocator and not Boost.Pool.
A Boost.Allocator library with some useful yet non trivial to write allocators (we had to design an the aligned one and an aligned adaptor one for Boost.SIMD) could be pretty nice and fill a nice gap between std::allocator and the actual need of allocator in modern code. As for Pooling, there is still some use cases on embedded systems where you are required to pool some Mb at the beginning of the applications then you want to go through normal allocator/container design to access it. Gb of RAM on COTS computer are not the only use case around ;)

On Wed, Jul 18, 2012 at 7:29 AM, Joel Falcou <joel.falcou@gmail.com> wrote:
As for Pooling, there is still some use cases on embedded systems where you are required to pool some Mb at the beginning of the applications then you want to go through normal allocator/container design to access it. Gb of RAM on COTS computer are not the only use case around ;)
Exactly. For example, for high performance video games on any console, there is no way you will avoid to have to pool memory in a way or another. It is possible to not do this, but in most actino game context it is not acceptable to have slow down because of allocations (though there are diverse ways to fix this, pooling memory is a general one). Also, and it's very important: budgeting memory is a very important practice in some very high performance games. Andrey Semashev said:
There is also an option to repurpose the library. Pooling raw memory may not be that needed today but pooling constructed objects is another story. More than once I had cases when constructing an object, including collateral resource allocation and initialization, is a costly operation and it is only natural to arrange a pool of such objects. While in the pool, the objects are not destroyed but merely "cleaned". An object can be retrieved and returned to pool multiple times, which saves the costly initialization. I could use a framework for building such pools. I think this is the most productive way of development of the library.
I was thinking is that mostly the Object Pool of Boost Pool was basically what you describe? At least on the purpose, maybe not very efficient in the implementation. Joel Lamotte

On Wednesday 18 July 2012 10:30:46 Klaim - Joël Lamotte wrote:
On Wed, Jul 18, 2012 at 7:29 AM, Joel Falcou <joel.falcou@gmail.com> wrote:
As for Pooling, there is still some use cases on embedded systems where you are required to pool some Mb at the beginning of the applications then you want to go through normal allocator/container design to access it. Gb of RAM on COTS computer are not the only use case around ;)
Exactly. For example, for high performance video games on any console, there is no way you will avoid to have to pool memory in a way or another. It is possible to not do this, but in most actino game context it is not acceptable to have slow down because of allocations (though there are diverse ways to fix this, pooling memory is a general one).
Ok, but that implies that the pool has to be at least as fast as the system allocator. Which Boost.Pool isn't. I admit, I am no game developer, but wouldn't be a fast allocator over a non-swappable memory region be a better solution?
Also, and it's very important: budgeting memory is a very important practice in some very high performance games.
Good point, that might be a useful feature.
Andrey Semashev said:
There is also an option to repurpose the library. Pooling raw memory may not be that needed today but pooling constructed objects is another story. More than once I had cases when constructing an object, including collateral resource allocation and initialization, is a costly operation and it is only natural to arrange a pool of such objects. While in the pool, the objects are not destroyed but merely "cleaned". An object can be retrieved and returned to pool multiple times, which saves the costly initialization. I could use a framework for building such pools. I think this is the most productive way of development of the library.
I was thinking is that mostly the Object Pool of Boost Pool was basically what you describe? At least on the purpose, maybe not very efficient in the implementation.
It creates and destroys objects within the pooled memory. Not much different from a raw memory pool.

On Wed, Jul 18, 2012 at 10:42 AM, Andrey Semashev <andrey.semashev@gmail.com
wrote:
Ok, but that implies that the pool has to be at least as fast as the system allocator. Which Boost.Pool isn't. I admit, I am no game developer, but wouldn't be a fast allocator over a non-swappable memory region be a better solution?
Sorry, I forgot to say that indeed the system allocator is not fast in these embedded contexts (at least in my experience). I think on PC if you don't want very high performance you can, as you suggest, still get good enough performance for a game. But on console most system will allocate memory very very slowly compared to a PC under Windows. By the way, what i mean here is that the pool ask the system to allocate memory once, then the objects are created/destructed in it. So if you use a pool, you should pay only for object creation/destruction, not memory allocation. That makes me think that maybe memory pools makes no sense when they are able to grow. To me a memory pool is useful only on fixed size, whatever the behaviour when this size is reached. Also, games are notoriously famous memory consumers: it allocate a lot of data to be ready to be used while the game is playing. This is why there are "loading screens", that are slow if allocation of each object separately is slow. Having a memory pool (or several for different purpose) for a game session allow both quick loading of data (because there is only copy/construction of this data, not memory allocation) and allows to destroy the memory pool when getting out of the game session. This make the game very responsive when the player just want to end the session and start another. Just for info, to get you the kind of context some game are in: another technique when even memory pools are not good enough is to have the game session memory map in a file that is directly loaded in the memory and ready to go once done. That way, you avoid object construction too, but it require some specific tools. Then you have the case where you can't know what kind of data will be present, because it depends on either the path taken by the player, or the data sent by another system, like in the case of massively multiplayer games where all the data is dependent on a remote system sending messages saying what the create/destroy when. In this kind of case, it is preferable to have one or several pools because you will have to pay for creation/destruction, but you don't want to pay for the allocation/deallocation associated. In particular when it happens a lot. I Hope it helps understand this context. Joel Lamotte

In non-game environment, you may have systems where you don't even have a heap or that design ruels prevent use of calls to malloc etc... for WCET or stability concerns. The only solution is to have a pool of memory on the stack and fake ctor/dtor call inside it.

In non-game environment, you may have systems where you don't even have a heap or that design ruels prevent use of calls to malloc etc... for WCET or stability concerns.
The only solution is to have a pool of memory on the stack and fake ctor/dtor call inside it.
Yeah! That's what I'm talking about. I actually embed very much C++ in microcontrollers down to 1kB of RAM and no heap because I exclude it in my self-written linker files. Over the years I wrote all kinds of high-performance allocation stuff: constructor w/ reinterpret_cast memory mapped to pools, circular allocators, static allocators with pools. It's fast if you want it to be. But Boost.Pool, I have not figured out yet. I would like to say: There really are some of us fitting junk into 64 bytes, or 1024, or whatever. Best regards, Chris.

On 07/18/2012 11:32 PM, Christopher Kormanyos wrote:
Yeah! That's what I'm talking about. I actually embed very much C++ in microcontrollers down to 1kB of RAM and no heap because I exclude it in my self-written linker files. Over the years I wrote all kinds of high-performance allocation stuff: constructor w/ reinterpret_cast memory mapped to pools, circular allocators, static allocators with pools. It's fast if you want it to be. But Boost.Pool, I have not figured out yet. I would like to say: There really are some of us fitting junk into 64 bytes, or 1024, or whatever. Best regards, Chris.
I think such collection of memory related utilities could have a far better impact than current pool

Quoting Andrey Semashev <andrey.semashev@gmail.com>:
On Wed, Jul 18, 2012 at 7:29 AM, Joel Falcou <joel.falcou@gmail.com> wrote:
As for Pooling, there is still some use cases on embedded systems where you are required to pool some Mb at the beginning of the applications
On Wednesday 18 July 2012 10:30:46 Klaim - Joël Lamotte wrote: then
you want to go through normal allocator/container design to access it. Gb of RAM on COTS computer are not the only use case around ;)
Exactly. For example, for high performance video games on any console, there is no way you will avoid to have to pool memory in a way or another. It is possible to not do this, but in most actino game context it is not acceptable to have slow down because of allocations (though there are diverse ways to fix this, pooling memory is a general one).
Ok, but that implies that the pool has to be at least as fast as the system allocator. Which Boost.Pool isn't. I admit, I am no game developer, but wouldn't be a fast allocator over a non-swappable memory region be a better solution?
I would be surprised if boost.pool is slower than a typical system allocator. I wonder if you're referring to ordered_alloc() and ordered_free() (which clearly are very slow and probably over-used). A very simple test of boost.pool [1] in 1.47 suggests that it is in fact faster than the system allocator (on Mac OS X 10.7.3). $ g++ -O2 -DUSE_BOOST_POOL=1 -IDocuments/dev/boost_1_47_0/ test.cpp $ time ./a.out real 0m0.422s user 0m0.406s sys 0m0.003s $ g++ -O2 test.cpp $ time ./a.out real 0m7.065s user 0m6.505s sys 0m0.021s
Also, and it's very important: budgeting memory is a very important practice in some very high performance games.
Good point, that might be a useful feature.
Another use case is when you're planning on allocating a very large number of a specific object. Relying on the system allocator, you may end up in a slab that holds allocation some tens of bytes larger than your actual object. If you allocate a few million of them, the extra memory use may become significant, as well as the potentially wasted L1 and L2 cache caused by not having your objects be packed back-to-back in RAM.
Andrey Semashev said:
There is also an option to repurpose the library. Pooling raw memory may not be that needed today but pooling constructed objects is another story. More than once I had cases when constructing an object, including collateral resource allocation and initialization, is a costly operation and it is only natural to arrange a pool of such objects. While in the pool, the objects are not destroyed but merely "cleaned". An object can be retrieved and returned to pool multiple times, which saves the costly initialization. I could use a framework for building such pools. I think this is the most productive way of development of the library.
I was thinking is that mostly the Object Pool of Boost Pool was basically what you describe? At least on the purpose, maybe not very efficient in the implementation.
It creates and destroys objects within the pooled memory. Not much different from a raw memory pool.
My understanding is that the main complaint with the object_pool in boost.pool is that when it constructs a new object, it uses ordered_malloc() instead of just malloc() (not the system function, the member of boost::pool). Personally, this has forced me to just use the raw boost::pool and manually call constructors and destructors. This seems like a trivial fix to make to the object_pool type, but I'm not entirely sure why this decision was made in the first place, perhaps there is a good reason. -- Arvid Norberg [1] http://codepad.org/vGuWLi0i

$ g++ -O2 -DUSE_BOOST_POOL=1 -IDocuments/dev/boost_1_47_0/ test.cpp $ time ./a.out
real 0m0.422s user 0m0.406s sys 0m0.003s
$ g++ -O2 test.cpp $ time ./a.out
real 0m7.065s user 0m6.505s sys 0m0.021s
Hmm.. This is strange... On linux: $ g++ -O2 -DUSE_BOOST_POOL=1 test.cpp -opool $ time ./pool real 0m0.381s user 0m0.376s sys 0m0.000s $ g++ -O2 test.cpp -opool $ time ./pool real 0m0.003s user 0m0.000s sys 0m0.000s OS: ubuntu-12.04 GCC: gcc-4.7.1 boost: 1.50.0 -- Regards, niXman ___________________________________________________ Dual-target(32 & 64 bit) MinGW compilers for 32 and 64 bit Windows: http://sourceforge.net/projects/mingwbuilds/

On Thu, Jul 19, 2012 at 11:28 AM, niXman <i.nixman@gmail.com> wrote:
Hmm.. This is strange...
Make sure that the entire benchmark didn't get optimized away as dead code in the system allocator case. Regards, Luke

On Thursday 19 July 2012 17:38:50 Simonson, Lucanus J wrote:
On Thu, Jul 19, 2012 at 11:28 AM, niXman <i.nixman@gmail.com> wrote:
Hmm.. This is strange...
Make sure that the entire benchmark didn't get optimized away as dead code in the system allocator case.
You can try adding -fno-builtin-malloc -fno-builtin-free to the compiler flags.

On Thu, Jul 19, 2012 at 10:28 AM, niXman <i.nixman@gmail.com> wrote:
$ g++ -O2 -DUSE_BOOST_POOL=1 -IDocuments/dev/boost_1_47_0/ test.cpp $ time ./a.out
real 0m0.422s user 0m0.406s sys 0m0.003s
$ g++ -O2 test.cpp $ time ./a.out
real 0m7.065s user 0m6.505s sys 0m0.021s
Hmm.. This is strange...
On linux:
$ g++ -O2 -DUSE_BOOST_POOL=1 test.cpp -opool $ time ./pool real 0m0.381s user 0m0.376s sys 0m0.000s
$ g++ -O2 test.cpp -opool $ time ./pool real 0m0.003s user 0m0.000s sys 0m0.000s
OS: ubuntu-12.04 GCC: gcc-4.7.1 boost: 1.50.0
It seems that the compier is pretty obviously optmizing the malloc/free pairs away. The (dead) store is not enough to prevent the optimization. HTH, -- gpd

I evaluated Boost.Pool some (long) time ago as a way to optimize performance for a few of my applications but it appeared even slower than the system allocator, let alone specialized solutions. Frankly, I don't see much point in pooling raw memory nowdays, except to achieve more control on when the memory is released. A per-container memory pool with no locking at all might also squeeze some performance, but considering per-thread memory pools in the system allocator this gain remains to be proved. A portable aligned memory allocator perhaps? Useful, but that's an item for Boost.Allocator and not Boost.Pool.
<snip>
As for Pooling, there is still some use cases on embedded systems where you are required to pool some Mb at the beginning of the applications then you want to go through normal allocator/container design to access it. Gb of RAM on COTS computer are not the only use case around ;)
I am one of those embedded systems developers who pools small chunks for one-shot allocation, for example. I just need how to find the time to figure out how to use pool better. Best regards, Chris.

Hopefully others will chime in here with their requirements as well. Cheers all, John.
Even though I can't use it due to lack of experience, I know what I would like. * Small pools going all the way down to, say, tens of bytes, like 32 bytes. * Near-zero overhead for creating static and stack-based pools. * The ability to lodge my own near-zero overhead custom allocator in a pool without having two write the whole allocator.But maybe I don't know much about Boost.Pool yet. Best regards, Chris.
participants (11)
-
Andrey Semashev
-
arvid@cs.umu.se
-
Christopher Kormanyos
-
Giovanni Piero Deretta
-
Ion Gaztañaga
-
Joel Falcou
-
John Maddock
-
Klaim - Joël Lamotte
-
niXman
-
Olaf van der Spek
-
Simonson, Lucanus J