Advantages and Disadvantages of Conservative Garbage Collection
Advantages and Disadvantages of Conservative Garbage Collection
Here are some issues to be considered in deciding whether or not to use conservative
GC with C or C++. Some of these no doubt reflect personal biases of the author.
Some of the observations are based on Detlefs, Dosser and Zorn's
Memory Allocation Costs in Large C and C++ Programs.
We refer specifically to
our garbage collector,
which can be retrieved from gc.tar.gz
in the gc_source directory.
A number of other interesting issues arise when comparing this style of conservative
garbage collection, or malloc/free memory management with more traditional garbage
collection techniques. They are not discussed here.
Garbage collection is not an all-or-nothing proposition. It is possible to explicitly
deallocate some common kinds of objects, and thus correspondingly decrease the frequency
of garbage collections and garbage collection cost. It is possible to accommodate
libraries that expect explicit deallocation calls. It is also possible to use the
collector only as a debugging tool to locate leaks and remove it completely from the
end product. But for the sake of clarity, we'll look at the tradeoffs between exclusive
use of explicit memory management (malloc/free or new/delete) and exclusive use of
garbage collecting allocation.
Development effort
- Garbage collected applications often require significantly less development time than
their counterparts with explicit memory management. It is hard to measure this, but
commonly proposed estimates are that 30% or 40% of development time devoted to storage
management for programs that manipulate complex linked data structures. (See, for
example, Rovner, ``On Adding Garbage Collection and Runtime Types to a Strongly-Typed,
Statically Checked, Concurrent Language'', PARC CSL-84-7.) C++ constructors and destructors
can help with the simple bookkeeping parts of the problem, but don't help much with
shared pieces of data structures.
- Well-written interface descriptions for C and C++ often invest some effort in assigning
deallocation responsibility and providing the needed extra functionality; less well-designed
interfaces introduce ambiguities that lead to bugs. Does a container object deallocate
its elements when it is destroyed? May a function retain pointers to its arguments
for caching purposes? In the best case, the result is significant added complexity.
- Multiple threads or exception handling further add to the complexity of manual deallocation.
For example, various C++ newsgroups and mailing lists have carried extensive discussions
about the proper way to deallocate partially constructed C++ objects in response to
an exception. Such issues disappear in the presence of a garbage collector.
- Note that the additional development effort required for manual memory management
could be used for a substantial amount of performance tuning of a different kind
Thus one way to phrase the issue is whether explicit memory management is the best
place, or even a reasonable place, to invest that effort.
Reusability
- To cope with manual storage management, programmers end up restricting and specializing
their designs for the task at hand. For example, an interface might require that
the only pointer to an object be passed, so that the called function can deallocate
the object. Or it might insist that the called function not reference an argument
on a subsequent call once it returns, so that the caller can deallocate the object.
The first one would be expensive or inappropriate for some potential users. The
second one would prohibit caching in a reimplementation of the interface. In either
case, we end up with less reusable code, and fewer reusable interfaces.
Functionality
- In general, it is considerably easier to implement and use sophisticated data structures
in the presence of a garbage collector. As a result, it is easier to avoid imposing
arbitrary limits on program parameters (e.g. string lengths), and to ensure that performance
scales reasonably for large inputs. The cord string package included with our garbage
collector is an example of a useful abstraction that could not be provided without
some form of garbage collection, and would be more expensive with user-provided reference
counting, or the like. A program based on something like the cord package has some
chance of continuing to run reasonably if it receives a 10 megabyte input string where
the programmer expected a line of text. This is much less likely with a string package
that uses a conventional representation.
Safety
- Conservative garbage collectors require mild restrictions on both the source code
and the compiler. The source program must not ``hide'' pointers in such a way that
they are invisible to the collector. There are very few ways in which a strictly
conforming ANSI C program could hide pointers. Writing and retrieving them from a
file is one way. For most collector configurations, using memcpy to copy pointers
to unaligned locations is another. Casts from pointers to integers and back to pointers
can cause problems. The most common problem is probably to refer to an object by
pointing to a known offset before the beginning of the object. The last one is clearly
not legal C code. All are uncommon even in existing code. It would be easy to build
a tool that identifies possibly dangerous code, except that it is somewhat expensive
to detect violations of the ANSI requirements on pointer arithmetic.
- Similarly, C compilers may not hide pointers in the generated object code. In our
experience, standard commercial compilers obey this restriction in unoptimized code.
Most aggressive optimizing compilers do not obey this restriction for all optimized
code. For details and examples see
papers/pldi96.ps.gz. However, it is difficult
to construct examples for which they violate it, especially for single-threaded code.
In our experience, the only examples we have found of
a failure with the current collector, even in multi-threaded code,
were contrived. (The most serious one was the garbage collector test
program compiled on an ARM with the vendor compiler.)
In contrast, we
have found many genuine optimizer bugs in various compilers over the same time period.
The structure of some compilers (e.g. gcc) should make it easy to guarantee GC-safety
in the future. The most significant obstacle appears to be the lack of real failures
to motivate such changes.
Debugging
- If a module A explicitly and prematurely deallocates an object P and
then continues
to write to it after the object is reallocated by module B, module B is likely to
fail, in spite of the fact that the fault is in module A. By the time the fault becomes
noticeable, there may be no trace of A's actions. The programmer responsible for
module B is likely to be held responsible for the problem, but has no reasonable way
of attacking it.
- Memory access checkers, if available, will often detect such faults, but only if module
A also writes to object P before it is reallocated.
- This often makes it difficult to debug large multi-person projects using explicit
deallocation. A garbage collector can eliminate premature deallocation errors. (It
cannot eliminate pointer arithmetic and subscripting errors, which may induce similar
problems. But it does facilitate higher level abstraction which confine pointer arithmetic
primarily to a few low level modules.)
- Similarly, memory leaks induced by failing to deallocate unreferenced memory may be
hard to trace and eliminate. Garbage collectors automatically eliminate such leaks.
- In either case, memory overwrite errors (e.g. indexing errors) may invalidate the
allocators data structures, causing mysterious faults in the allocator. This is arguably
somewhat worse in the garbage collected case, since the data structures tend to be
more complex. Some limited debugging tools are provided to address this problem;
more would be desirable. Availability of allocator source code is sometimes an advantage.
Space Performance
- The presence of garbage collection typically allows the user to allocate fewer, and
smaller, data structures. Putting this aside temporarily, a leak-free program written
for malloc/free or new/delete is likely to use more space when these are replaced
by GC_malloc/noop, especially if space is measured in total allocated address space.
See the above cited paper by Detlefs, Dosser and Zorn for details. The primary reason
for this is that the collector needs to maintain a certain amount of empty space in
the heap, so that it does not have to garbage collect after every allocation.
- On the other hand, programs written for explicit memory management often incur space
overheads in order to preserve properties needed for deallocation. Such overhead
might include reference count fields, extra copying of structures to guarantee that
an object has a single owner, fragmentation introduced by maintaining separate memory
pools for different modules, or cyclic garbage leaked by reference counting. Such
overhead is avoided by programs that directly use garbage collected allocation. How
this compares to GC-induced space overhead is very application-dependent.
- There are some other factors that sometimes increase space used by our collector.
The one that appear to have been most significant in Detlefs, Dosser and Zorn's
Memory
Allocation Costs in Large C and C++ Programs is that it is not well tuned for small
heap sizes, and has a relatively large, nearly fixed, space overhead. Others are:
- (1) The collector expands the heap in relatively large chunks, parts of which might
never be used, and thus may never need to be in physical memory.
- (2) The collector sometimes avoids using certain pages it allocates because it detects
integers that appear to point to the page. This also results in allocated virtual
memory that is never physically resident.
- (3) Memory might appear to be referenced by integers and the like.
- (4) Memory might still be referenced through an uncleared pointer even though it was
deallocated in the original program.
- In early measurements by the author and Zhong Shao, the last two appeared insignificant
in practice for most programs, though (4) may occasionally be a factor. More
recent measurements by Martin Hirzel and Amer Diwan
(
On the Type Accuracy of Garbage Collection, ISMM 2000) are
less positive, but also do not show growing leaks through either (3) or (4).
Our recent experience still suggests that (3) is typically not a problem, but
can become an issue if either the program regularly allocates very large objects, or
if the the total size of the collected heap approaches the size of the overall address space.
(On 32-bit machines, heap sizes of hundreds of megabytes or gigabytes
may be problematic. We have not heard of any issues on 64-bit machines.)
- Neither our garbage collector nor malloc/free implementations
provide generally useful
guaranteed space usage bounds, since worst-case fragmentation overhead is generally
unacceptable. For a more detailed analysis see our paper
Bounding Space Usage of Conservative Garbage Collectors, POPL 2002.
CPU-time performance
- The overall processor time used by the collector for small objects allocation and
collection is comparable to a good malloc/free implementations. In a multi-threaded
environment, it may be substantially less, since there are no calls that acquire lock
acquisitions; thus the number of lock acquisitions may be effectively halved. (The
same may apply on platforms on which the malloc/free implementation
is not particularly
time-efficient. Note that such malloc/free implementation
may be either poorly designed or tuned for space.) However, the time required for
larger objects is effectively proportional to the object size, both since the collector
typically initializes it, and because larger allocations increase GC frequency correspondingly.
This is unlikely to affect asymptotic program running time; after all, the client
will presumably also take time at least proportional to object size to fill in the
object. But it does put the collector at a disadvantage for programs that allocate
primarily larger objects.
- For multi-threaded programs, there are cases in which a garbage collected program requires
drastically less locking than the equivalent program with explicit memory management.
Here is an example.
- Unlike reference counting, conservative garbage collection does not affect the performance
of pointer assignments and the like.
Latency
- Neither the collector nor most malloc/free
implementations provide hard guarantees
on the amount of time it takes an allocation to complete. In most cases, maximum
observed latencies will be appreciably shorter
with a malloc/free implementation than
with our collector. (This is not necessarily true for GC implementations that use
more compiler support.) Our collector provides the option of incremental collection
on many platforms with sufficient virtual memory support. This is normally sufficient
to provide good interactive response even with very large heaps. Latencies are normally
comparable to those introduced by the VM system.
- Somewhat better latencies can be achieved by triggering collections during idle time,
and aborting them in response to user input. This requires enough idle time for a
collection to complete before excessive heap expansion. This facility does not rely
on VM support.
Paging locality
- A common concern about garbage collection, or any form of dynamic memory allocation,
is its interaction with a virtual memory system. Accesses to virtual memory should
be such that the traffic between disk and memory is small, i.e. most access should
be to pages that were already recently accessed.
- On modern computers, where disks are so much slower than CPUs, many programs page
very little, and most of their heaps reside in the working set. But even for those
programs in which significant parts of the heap do not reside in the working set,
there are a number of techniques which dramatically increase the locality of reference
of a mark-and-sweep collector.
- The fundamental problem is that all memory that may possibly contain pointers has
to be examined during every full collection. We use several techniques to lessen
the impact of this. Often a large fraction of a programs heap can be allocated as
pointer-free memory using GC_malloc_atomic. Such memory is not examined during collections.
(Mark bits are separated from data pages to ensure this.) The collector does not
have a separate sweep phase during which the heap is touched. Generational garbage
collectors, including ours in incremental mode, examine primarily recently modified
memory during most collections. Such memory is nearly certain to be resident in physical
memory. It is possible to mark the heap in sections, so that the collector mark phase
itself exhibits better locality. (This was implemented with some success in older
versions of Xerox PCR by Carl Hauser.
It was not included in more modern versions
of the collector since the other measures appeared to suffice.) Nearly all of the
page faults that do occur during a full collection will occur during a collection
phase in which the client can still make some progress, thus making paging somewhat
less visible to the user. For some details and a few measurements see the paper
"Mostly Parallel Garbage Collection". (The paper "The measured Cost of Conservative
Garbage Collection" by B. Zorn, Software - Practice and Experience, July 1993, includes
some measurements of paging behavior. Unfortunately they are of necessity based
on C programs written for malloc/free, and a very old and naive version (1.6) of our
collector. Unlike later versions, this collector was also found to yield unacceptable
paging performance at PARC.)
- Unlike many malloc/free implementations, our collector will normally allocate available
objects from the same page consecutively. This is likely to ensure that successively
allocated similar objects reside on the same page. Such objects are likely to be
linked to each other, and examined at similar times, both by the collector and by
its client. As a result, the client may exhibit better reference locality than a
malloc/free client.
Portability
- Our collector is not, and cannot be, implemented as completely portable C code. However,
it has been ported to all common workstation and PC platforms, excluding PCs with
segmented addressing. It has been ported to several flavors of Microsoft's win32
environment.
Hans-J. Boehm
(boehm@acm.org)
These are the author's personal opinions.
I would like to thank John Ellis for his comments on earlier drafts of this page.
This is an updated version of a page written while the author was at Xerox PARC.
The original is here.