Proposal for a precise STL-based garbage collection library

Dear Boost developers, I have posted in the Vault a small library (named "CPPGC") that implements a precise garbage collector for C++. You can find it here: http://www.boost-consulting.com/vault/index.php?PHPSESSID=hqle9ubanl575jgl87ajobeed4&direction=0&order=&directory=Memory The zipped file contains the following readme file (along with the sources and an example): --BEGIN README.TXT-- Introduction ------------ CPPGC is a garbage collection library for C++. It has the following features: 1) precise collector. 2) uses the mark & sweep algorithm. 3) supports pointers to middle of objects. 4) supports arrays of objects. 5) uses only standard C++; no platform-specific code. 6) it can be used to retrofit non-GC'd programs with GC capabilities. 7) very small (under 500 lines of code). Usage ----- Include the following files in your project: cppgc.hpp cppgc.cpp Use the following code to include GC functionality: #include "cppgc.hpp" using namespace cppgc; Wrap your classes and pointers inside the class gc<T>. For example: gc<std::list<int> *> gc_list = new gc<std::list<int> >; gc_list->push_back(10); class Foo { public: gc<Foo *> m_other; }; gc<Foo *> foo1 = new gc<Foo>[10]; foo1[5].m_other = new gc<Foo>; gc<Foo> foo2[3]; foo2[1].m_other = foo1; Portability ----------- As we speak (March 2007), the classes hash_multimap, hash_set are not yet part of the std namespace. But it will be, and most compilers offer these classes in their STL implementation anyway. The code is currently tested under Visual C++ 2005, so the classes hash_multimap and hash_set are found in the stdext namespace. If you use a different compiler, feel free to replace the following code: using namespace stdext; with the namespace your compiler has for the hash classes. Threads ------- Threads are not supported. Future versions may support multithreading by providing a set of functions the library user can fill with the appropriate environment-specific code which allow the operation of the collector in a multithreaded environment. Internals --------- The idea behind the library is very simple: In order to know where the collector's pointers are, the collector maintains a memory map. The first entry of the memory map is all the memory (from 0x00000000 to 0xFFFFFFFF); this entry contains the root set. When an object is allocated on the heap, a relevant memory region entry is added to the memory map. When a pointer is created, it adds itself to the relevant memory map entry. Upon destruction, a pointer removes itself from its memory map entry. When a heap-allocated object is destroyed, the memory region entry of the object is removed from the memory map. If a pointer is not a member of a heap-allocated block, then it is added to the first entry of the memory map, which is the root set. The class used for the memory map is a hash_multimap, which allows efficient lookup of a memory region based on an memory address. This allows for pointers to the middle of blocks. A multimap is used instead of a map because keys (i.e. memory regions) can overlap one another. The collector is a classic mark & sweep implementation with 2 phases: phase 1: The graph of objects is traversed in order to mark the objects as reachable. The scanning starts from the root set, i.e. the first entry of the memory map. phase 2: the heap blocks are traversed; blocks that are not touched are deleted. A collection happens automatically when the collector's memory limit is exceeded or manually when the user invokes the 'collect_garbage' function. Performance ----------- It is not the fastest thing around, but it is better than nothing :-). Since it uses a hash map for its memory region, lookup is an O(1) operation on average (or so says the theory). It can be a nice altenative for projects that do not have great expectations on performance or where a commercial collector is not available. Notes ----- The file 'main.cpp' contains an example. --END README.TXT-- Please feel free to suggest improvements, post critisisms, discuss anything you want about it. Thank you in advance. Achilleas Margaritis

On 3/12/07, Achilleas Margaritis <axilmar@otenet.gr> wrote:
Dear Boost developers,
I have posted in the Vault a small library (named "CPPGC") that implements a precise garbage collector for C++. [...]
Interesting, I could use a small & simple GC...
[...] Internals ---------
The idea behind the library is very simple:
In order to know where the collector's pointers are, the collector maintains a memory map.
The first entry of the memory map is all the memory (from 0x00000000 to 0xFFFFFFFF); this entry contains the root set.
When an object is allocated on the heap, a relevant memory region entry is added to the memory map.
When a pointer is created, it adds itself to the relevant memory map entry. Upon destruction, a pointer removes itself from its memory map entry.
When a heap-allocated object is destroyed, the memory region entry of the object is removed from the memory map.
If a pointer is not a member of a heap-allocated block, then it is added to the first entry of the memory map, which is the root set.
The class used for the memory map is a hash_multimap, which allows efficient lookup of a memory region based on an memory address. This allows for pointers to the middle of blocks. A multimap is used instead of a map because keys (i.e. memory regions) can overlap one another.
The collector is a classic mark & sweep implementation with 2 phases:
phase 1:
The graph of objects is traversed in order to mark the objects as reachable. The scanning starts from the root set, i.e. the first entry of the memory map.
phase 2:
the heap blocks are traversed; blocks that are not touched are deleted.
A collection happens automatically when the collector's memory limit is exceeded or manually when the user invokes the 'collect_garbage' function.
Performance -----------
It is not the fastest thing around, but it is better than nothing :-). Since it uses a hash map for its memory region, lookup is an O(1) operation on average (or so says the theory).
It seems awfully slow (at least without testing it :)). Do you have any benchmark? Wouldn't a reference counted pointer with cycle detection be much simpler AND faster? If you can control the allocator as you do (and thus allocate the refcount together with the object itself) and you do not need to be thread safe, this be much faster than your approach. IMHO, instead of explicitly marking every gc pointer, every potentially gc'ed class should provide a visitor function that provides an iteration interface over all pointers contained in an object. You store the address of such a function right before the allocated object (like a type tag). The function would also provide a destruction hook and store the size of the allocated object. If you store a run of same type objects, you only need to store one pointer to the visitor at the beginning of the tag. With this approach, std containers, tuples and fusion sequences would come for free. You need to provide the visitor of other user defined datatypes, but it is arguable if this is more bothersome than explicitly marking every pointer. The user would need to explicitly mark the root set before calling the garbage collector. This means that the gc can only be called explicitly (manipulating the root set is expensive, so you want to do it only when absolutely necessary). If the gc'ed heap runs out of memory would simply throw. Somewhere down in the stack, the exception would be caught, the root pointers saved and and the gc invoked. This way you also get deterministic gc. Or an user callback could be called that would call the gc. When gc() returns all unreachable object will be guaranteed to have been reclaimed (and their destructor invoked). Optionally, the user could just register a 'register root' callback that could be invoked automatically when needed. This approach might not be generally applicable, but I've found that I could really use such a GC to quickly allocate temporary objects during a processing phase (In my case, somewhat complex functional expressions a la FC++. These objects are often tuples or containers, so you get introspection for free!). At the end of the phase, only some of these objects are needed, all the others can be explicitly reclaimed by calling the GC explicitly.
Please feel free to suggest improvements, post critisisms, discuss anything you want about it.
I think that a specialized GC would fit well in boost, so keep up with the good work! gpd

On 03/12/2007 06:49 AM, Giovanni Piero Deretta wrote:
On 3/12/07, Achilleas Margaritis <axilmar@otenet.gr> wrote: [snip] It seems awfully slow (at least without testing it :)). Do you have any benchmark?
Achilleas, many years ago I had a benchmark for comparing precise refcounted pointer gc with cycle detection against the Boehm gc. It used a specialized smart ptr for both Boehm and another smart ptr for the refoucnted gc. If you think you could use it, I could post to the vault. It also did poorly vs. Boehm w.r.t. time. However, the code was streamlined to speed it up and merged with David Held's policy_ptr library: http://boost.cvs.sourceforge.net/boost-sandbox/boost-sandbox/boost/policy_pt... However, I've not run any benchmarks. The version in boost sandbox under policy_ptr has been adapted on my hard-drive to use a successor to the fields_visitor library to precisely enumerate the smart ptrs contained in a "registered" class. A class is registered by creating an instance of singleton template with the class as the template argument (see SELECTED_FIELDS_DESCRIPTION_OF_RECORD in: <boost-sandbox>/boost/fields_visitor/fields_visitor.hpp )
Wouldn't a reference counted pointer with cycle detection be much simpler AND faster? If you can control the allocator as you do (and thus allocate the refcount together with the object itself) and you do not need to be thread safe, this be much faster than your approach.
IMHO, instead of explicitly marking every gc pointer, every potentially gc'ed class should provide a visitor function that provides an iteration interface over all pointers contained in an object. You store the address of such a function right before the allocated object (like a type tag). The function would also provide a destruction hook and store the size of the allocated object. If you store a run of same type objects, you only need to store one pointer to the visitor at the beginning of the tag.
Giovanni, this is almost what the aforementioned policy_ptr library does when the precise smart pointers are used. However, instead of storing the visitor function pointer inside the allocated memory, the precise smart pointer, because it knows the type of pointee and, presumably, the pointee class is registered (see above), it knows the location of the precise smart pointers within that pointee.
With this approach, std containers, tuples and fusion sequences would come for free. You need to provide the visitor of other user defined datatypes, but it is arguable if this is more bothersome than explicitly marking every pointer.
With the precise policy_ptr, fusion sequences and tuples would have to be registered, I think... Maybe not. If the selected_fields_description_of template (used by the SELECTED_FIELDS_DESCRIPTION_OF_RECORD macro mentioned above) were specialized for tuples and fusion sequences, maybe it wouldn't have to be explicitly registered. Thinking a little further, I pretty sure this would be similar to the way stl containers are handled: boost/fields_visitor/container_extern/single.hpp Will investiage to make sure.
The user would need to explicitly mark the root set before calling the garbage collector. This means that the gc can only be called explicitly (manipulating the root set is expensive, so you want to do it only when absolutely necessary). If the gc'ed heap runs out of memory would simply throw. Somewhere down in the stack, the exception would be caught, the root pointers saved and and the gc invoked. This way you also get deterministic gc. Or an user callback could be called that would call the gc.
If refcounting with cycle detection were used, there would be no need for a root set; however, this could lead to quadratic time penalty in the case of densely connected pointer graph (p. 3 of: http://www.research.ibm.com/people/d/dfb/papers/Bacon01Concurrent.pdf ). This quadratic behavious was actually shown by the benchmark code mentioned above.
participants (3)
-
Achilleas Margaritis
-
Giovanni Piero Deretta
-
Larry Evans