An expanded and corrected version of this paper will be presented at POPL 2002. A version of this revised paper is currently available as HP Labs technical report 2001-251, which is temporarily available in PostScript here and should eventually appear here. The revised paper is a substantial improvement over what follows below.
Conservative garbage collectors are often maligned with the claim that it is impossible to bound space usage of an application using any form of conservative garbage collection. Empirically, this doesn't seem to be a problem for most desktop applications. The large majority of applications behave perfectly well with a conservative garbage collector. A few applications don't, especially if the collector is too naive. But in those cases, the failure is generally easily detectable during testing, and can be avoided by communicating small amounts of type information to the garbage collector.
We are only aware of one explicitly deallocating program in which replacement of explicit deallocation with conservative garbage collection failed unavoidably. And in that case conservative pointer tracing was not the issue; the program relied on deallocating reachable, but unaccessed memory. Thus no garbage collector could have solved the problem.
But our concern here is not the empirical performance of conservative garbage collection. Instead we would like to prove mathematical bounds on space consumption. The primary goal is to understand why conservative collectors behave well in practice. This also gets us closer to making conservative garbage collectors safe for embedded environments that require hard space bounds.
A more common example, mentioned in [Boehm93] is a simple queue data structure, implemented as a singly linked list, with a head and tail pointer. Elements are inserted into the tail position and removed at the head by advancing a head pointer. Removed elements should be reclaimable by the garbage collector. But a single false pointer to one of the queue elements will prevent any further such reclamation. Thus if the active queue length is intended to be bounded, a conservative collector may arbitrarily increase space consumption as the result of a single false reference.
We make several observations about this example:
An object x is backward-forward reachable from a root r through y iff y is backward-reachable from r and x is reachable from y. x is backward-forward reachable if there is an object y such that x is backward-forward reachable through y. An object is backward-forward reachable if it is backward-forward reachable from any root.
Call a data structure, i.e. a set of objects, circular if, for any two objects x and y in the set, y is reachable from x if and only if x is reachable from y, i.e. an object is forward-reachable from another if and only if it is backward-reachable. Data structures, such as doubly linked lists or trees with parent pointers, that maintain back-pointers corresponding to all forward pointers are a particular instance of this. Theorem If a program only builds circular data structures then the set of backward-forward reachable objects is the set of reachable objects.
Proof Trivial.
Theorem If a program is purely functional, then the set of all objects S backward-forward reachable from a root r through an object y, was at some point reachable from a single root.
Proof The object y must have been accessible through a root at some point. Since objects are not changed, all of S was reachable through y at that point.
We define a data structure to be GC-robust if it satisfies the conclusion of the preceding theorem, that is if all objects backward-forward from a root were at some point reachable from a root. Both data structures created by functional programs and circular data structures are GC-robust.
Conjecture All common data structures, except the straight-forward implementation of singly-linked queues, are GC robust.
Theorem Consider a conservative garbage collector which misinterprets a single arbitrary integer a as pointer. The set of all objects S that are "reachable" from a, i.e. that are reachable from the object containing address a, were at some point backward-forward reachable from a single root.
Proof Consider the latest modification or creation time t of any object y in S. Clearly at that point y was pointed to by a root r. Let x be the object "pointed to" by a. At time t, x was backward-reachable from y, since y is in S and all elements of S were reachable from x. (Recall that no elements of S where modified after time t.) Thus S was backward-forward reachable from r at time t.
Corollary If a program uses only GC-robust data structures, and the number of pointers misidentified by a garbage collector is bounded (e.g. because only the stack is scanned conservatively, and its depth is bounded, or because only the top stack frame is scanned conservatively), then the extra space returned by conservative pointer scanning is bounded by a constant times the maximal amount of live memory.
Proof Follows trivially from the preceding theorem and the definition of GC-robustness. Each misidentified pointer retains a data structure that was backward-forward reachable, and hence reachable at some point during program execution. Thus the memory retained by a single misidentified pointer is bounded by the maximal amount of live memory used during program execution.
The above Corollary gives a rather extreme bound on excess memory retention, but a bound nonetheless. The bound itself may be sufficient to give hard guarantees in a few cases (e.g. if only a single stack frame is conservatively scanned, and/or the program maintains many disconnected data structures, only one of which can be retained by a false reference. In any case, it helps to explain why real conservatively collected programs don't grow without bounds, and rarely retain significant amounts of extra memory.
[Boehm93] Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988, pp. 807-820.
[Wentworth90] E. P. Wentworth, "Pitfalls of Conservative Garbage Collection", Software Practice and Experience 20(7), July 1990, pp. 719-727.