Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

hp.com home


Installing and using qprof

» 

HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» People
» Worldwide sites
» Downloads
Content starts here

Installing qprof

To install qprof, unpack the distribution and change to the resulting qprof- directory. Then:

  1. (Optional) Make sure that the PREFIX variable in Makefile is set to the appropriate installation directory. Files will be installed in $(PREFIX)/lib/qprof-version, $(PREFIX)/include/qprof-version, and $(PREFIX)/doc/qprof-version. The above directories will be linked to $(PREFIX)/lib/qprof, $(PREFIX)/include/qprof, and $(PREFIX)/doc/qprof.
  2. (Optional) Unpack a copy of libunwind in the source directory, or create a symbolic link from libunwind- to the identically named source directory elsewhere. (You will need at least version 0.93. For version 0.93 on x86, apply the included libunwind-0.93.patch.) If this step is performed, a very basic version of call-stack profiling will become available.
  3. (Optional) Type "make" (or "make check" to also run tests).
  4. (Needs permission to write to PREFIX directory, if set.) Type "make install".
To start profiling all programs run from a particular shell:
  1. Run "source <PREFIX from above>/lib/qprof/alias.csh" or ". <PREFIX from above>/lib/qprof/alias.sh", depending on your shell. If you skipped step one above, PREFIX is <build directory from above>/installed. Or you can run the identical scripts from the build directory. For regular use, put one of the above commands into your .bashrc or .cshrc files.
  2. (Optional)In an ANSI color-capable terminal window (e.g. most xterm variants), set the environment variable QPROF_COLOR to, for example, "green" to distinguish profiling output from normal command output.
  3. Run qprof_start to start profiling.
  4. Run commands to be profiled.
  5. Run qprof_stop to stop profiling.
Assumptions made by the above:
  1. The LD_PRELOAD environment variable is not already set for other reasons. (If you don't know what it's used for, you're probably OK. If you do know what this means, you can probably fix up the qprof_start and qprof_stop aliases to make things work with another preload library.)
  2. You are running only dynamically linked executables. If you don't know what this means, you can ignore it. (Statically linked programs can be profiled by calling the prof_utils.h routines directly from the application to be profiled.)
  3. There are no doubt some library version dependencies. RedHat 7, 8, and 9 should work, as should other Linux distributions from the same era.
  4. Nothing done by the process interferes with profiling. Empirically, this works fine for nearly all applications. But since the profiler runs as part of the application process, obscure kinds of interference are probably possible.
Interpreting the results:
  • Each line in the output reflects a range of program addresses (by default a line), and lists the number of times a program counter in that range was sampled. Lines containing large count values consumed more processor time.
  • The output is usually far less informative if the program did not contain debug information. You can still get fully precise output from such programs by setting the environment variable QPROF_GRANULARITY to instruction (see below). But it may be difficult to make sense out of the hexadecimal addresses which will be printed as a result.
  • If you prefer to see "hot" regions of the program first, save the output and pipe it through sort -n -k2 -r. By default, profiling output is written to stderr. But see QPROF_FILE here.
  • For multithreaded processes, each thread is sampled separately, and the reported results are the sum of all samples. Thus a process with 4 always runnable threads running on a 4 processor machine with the default sampling frequency of 100 samples a second would report 400 samples per second.
Adjusting profiling output:

The output produced by qprof depends on several environment variables. In particular, QPROF_GRANULARITY can be set to one of function, line, or instruction to control whether samples should summed for each function, line, or instruction. Setting QPROF_REAL will cause the profiler to sample based on wall clock time, and should thus point out where processes are waiting. QPROF_STACK will effectively include time spent in called functions to be included in the caller's (parent's) counts. Other relevant environment variables are described here.

To profile using hardware event counters:
(This currently works only on Itanium.)
  1. Install a supported underlying event counter library. (Currently this is Itanium perfmon).
  2. Add -DHW_EVENT_SUPPORT to CFLAGS in Makefile; Build as above. (Perfmon must be installed for the profiler to build. If it is missing at runtime, qprof will still run, but without hardware event support. If you are using libpfm3 on a 2.6 kernel, replace prof_utils.c in the distribution with prof_utils.c.libpfm3.)
  3. Run pfmon -l to find the appropriate event name.
  4. Set the environment variable QPROF_HW_EVENT to the event name. Profile as above. (QPROF_INTERVAL can be set to a number n to indicate that the program counter should be sampled every nth event. By default n is 10,000.)
  5. Note that the program counter is sampled when the process is notified of the event. This may be a few cycles after the event occurred. For example, cache miss events are likely to be attributed to an instruction that uses the resulting value, or even a slightly later instruction. You should be able to determine which loop is causing cache misses, but it will take a little bit of guess work to identify the actual load or store instruction.


project links

» project home
» using qprof
» example
» downloads
» license
» people
Printable version
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2009 Hewlett-Packard Development Company, L.P.