This is a set of profiling utilities, currently targeting only linux.
It includes a simple command line profiling tool, with the following
characteristics:
- It is intended to be easy to install and use. No kernel modules or changes
are required for basic use. It can be installed and used without root access.
- It supports profiling of dynamically linked code and includes information
on time spent in dynamic libraries.
- It supports profiling of multithreaded applications.
- It generates profiles for all subprocesses started from a shell.
Thus it easily can be used to profile application with multiple processes.
- It tries to generate symbolic output. This is usually successful for
the main program, if that has debug information, i.e. was compiled with -g.
If not, you may need a debugger to fully interpret the results. However
the raw output will often give you a rough idea of where processor time
is spent.
- It currently generates "flat" profiles. The output tells you roughly
how much time was spent in a given instruction, line, or function f.
By default this does not include time spent in functions called by f,
but on platforms supported by
libunwind
a possible alternative is to include callees in profile counts, thus
recovering some gprof-like functionality.
- Linux kernel functions are not profiled separately. By default, time spent
in the kernel is credited to the library function which made the kernel call.
- On Itanium, it can be used to generate hardware-event-based profiles.
For example, it can tell you were most of the cache misses occur.
Instructions for installing and using the profiler are given
here.
A sample session is presented here.
The package can be downloaded
from here.
The profiler is licensed under several different GPL-compatible licenses.
In many cases, reuse of the library components in proprietary applications
is allowed. See the LICENSING.txt file
in the distribution for more details.
Other included packages
The distribution includes three other facilities which may be useful
outside of a profiling context:
- Atomic_ops
-
Provides implementations for atomic memory update operations on a number of
architectures. This allows direct use of these in reasonably portable code.
Unlike earlier similar packages, this one explicitly considers memory
barrier semantics, and allows the construction of code that involves
minimum overhead across a variety of architectures. The plan is to
generalize this to non-Linux platforms soon. It is also available
as a separate distribution from
here.
It should be useful both for high performance multi-threaded code which
can't afford to use the standard locking primitives, or for code that
has to access shared data structures from signal handlers. For details, see
README_atomic_ops.txt in the distribution.
- Some lock-free data structures
-
Handler_safe_data.h describes some interfaces that, for example, support
simple memory allocation from signal handlers. These are based on
the atomic_ops package.
- Wrap.h
-
Provides a reasonably general purpose facility for wrapping library
functions (i.e. forcing user-specified code to be executed before
and after a call to a standard library function) by redefining them and
then using dlopen and dlsym. This is probably viable
only on Linux/Unix platforms. The profiler uses it to intercept
thread creation. See README_wrap.txt
for details.
Some more details can be found in the README.txt file.
Related packages
We are aware of the following open source packages that are either related to,
or perform sampled profiling on Linux.
- Gprof
-
This is the standard Linux profiler. It can generate approximate call-graph
profiles. It doesn't appear to interact well with threads or dynamic libraries.
Requires relinking for flat profile and recompilation for call-graph profile.
- Sprof
- An analogous but separate facility for displaying shared library profiles.
- Cprof
- A thread-aware profiler for Linux based on gcc-based code
instrumentation.
A while ago we found it nontrivial
to get running on many Linux platforms, but its maintenance status
has recently improved.
- Oprofile
-
A system wide profiling tool. Requires a kernel module.
- Prospect
-
Another system-wide profiler. Based on the Oprofile kernel module.
- Perfmon and pfmon tool
- A library and command to access hardware profile counters on Itanium.
We rely on this for hardware event support. By itself, it can be used to
count hardware events in a program region, etc.
|