The benchmark should run in roughly 32MB or less on most systems, and some systems run it in about half that. A Pentium III/500 runs the garbage collected C version in under 8 seconds. Some Java implementations have similar or better performance. Some are an order of magnitude slower.
In addition to an overall execution time, the benchmark reports times required to allocate and drop complete binary trees of various sizes. All reported times are for similar amounts of allocation, and maintain some more permanent live data structures. Generational collectors generally exhibit much better performance for the smaller tree sizes, where nongenerational collectors tend to have a flatter performance profile.
This benchmark appears to have been used by a number of vendors to aid in Java VM development. That probably makes it less desirable as a means to compare VMs. (It also has some know deficiencies, e.g. the allocation pattern is too regular, and it leaves too few "holes" between live objects.) It now appears to be most useful as a sanity test for garbage collector developers.
(The code has been used on X86 and IA64 Linux systems. It requires minimal porting for other systems.)
Code to be profiled should call init_profiling() at startup and dump_profile() before termination. This generates a list of addresses and counts on stderr if this redirected into prof.out, something like the following will generate a readable version of the profile:
nm {option for BSD-style output} {executable} > nm.out cat nm.out prof.out | sort > profile