
On 3/12/07, Achilleas Margaritis <axilmar@otenet.gr> wrote:
Dear Boost developers,
I have posted in the Vault a small library (named "CPPGC") that implements a precise garbage collector for C++. [...]
Interesting, I could use a small & simple GC...
[...] Internals ---------
The idea behind the library is very simple:
In order to know where the collector's pointers are, the collector maintains a memory map.
The first entry of the memory map is all the memory (from 0x00000000 to 0xFFFFFFFF); this entry contains the root set.
When an object is allocated on the heap, a relevant memory region entry is added to the memory map.
When a pointer is created, it adds itself to the relevant memory map entry. Upon destruction, a pointer removes itself from its memory map entry.
When a heap-allocated object is destroyed, the memory region entry of the object is removed from the memory map.
If a pointer is not a member of a heap-allocated block, then it is added to the first entry of the memory map, which is the root set.
The class used for the memory map is a hash_multimap, which allows efficient lookup of a memory region based on an memory address. This allows for pointers to the middle of blocks. A multimap is used instead of a map because keys (i.e. memory regions) can overlap one another.
The collector is a classic mark & sweep implementation with 2 phases:
phase 1:
The graph of objects is traversed in order to mark the objects as reachable. The scanning starts from the root set, i.e. the first entry of the memory map.
phase 2:
the heap blocks are traversed; blocks that are not touched are deleted.
A collection happens automatically when the collector's memory limit is exceeded or manually when the user invokes the 'collect_garbage' function.
Performance -----------
It is not the fastest thing around, but it is better than nothing :-). Since it uses a hash map for its memory region, lookup is an O(1) operation on average (or so says the theory).
It seems awfully slow (at least without testing it :)). Do you have any benchmark? Wouldn't a reference counted pointer with cycle detection be much simpler AND faster? If you can control the allocator as you do (and thus allocate the refcount together with the object itself) and you do not need to be thread safe, this be much faster than your approach. IMHO, instead of explicitly marking every gc pointer, every potentially gc'ed class should provide a visitor function that provides an iteration interface over all pointers contained in an object. You store the address of such a function right before the allocated object (like a type tag). The function would also provide a destruction hook and store the size of the allocated object. If you store a run of same type objects, you only need to store one pointer to the visitor at the beginning of the tag. With this approach, std containers, tuples and fusion sequences would come for free. You need to provide the visitor of other user defined datatypes, but it is arguable if this is more bothersome than explicitly marking every pointer. The user would need to explicitly mark the root set before calling the garbage collector. This means that the gc can only be called explicitly (manipulating the root set is expensive, so you want to do it only when absolutely necessary). If the gc'ed heap runs out of memory would simply throw. Somewhere down in the stack, the exception would be caught, the root pointers saved and and the gc invoked. This way you also get deterministic gc. Or an user callback could be called that would call the gc. When gc() returns all unreachable object will be guaranteed to have been reclaimed (and their destructor invoked). Optionally, the user could just register a 'register root' callback that could be invoked automatically when needed. This approach might not be generally applicable, but I've found that I could really use such a GC to quickly allocate temporary objects during a processing phase (In my case, somewhat complex functional expressions a la FC++. These objects are often tuples or containers, so you get introspection for free!). At the end of the phase, only some of these objects are needed, all the others can be explicitly reclaimed by calling the GC explicitly.
Please feel free to suggest improvements, post critisisms, discuss anything you want about it.
I think that a specialized GC would fit well in boost, so keep up with the good work! gpd