man pages for Tru64 UNIX and Windows
system permits low-overhead continuous profiling of all executables,
including the kernel. It is based on periodic sampling using
the Alpha performance counter hardware. Profiles containing
samples for each executable image (including shared libraries)
are stored in a user-specified directory.
are provided to analyze profiles and produce a breakdown of
all cpu time by image, and by procedure within images. In addition,
detailed information can be produced showing the total time
spent executing each individual instruction in a procedure.
Additional analysis tools also determine the average number
of cycles taken by each instruction, the approximate number
of times the instruction was executed, and the possible reasons
for any cycles spent stalled not executing instructions (e.g.,
waiting for data to be fetched from memory).
material below provides an overview of the system, including
examples of its use. Detailed man pages for Tru64
UNIX and Windows
NT are also available; they give more information about
each program in the system, including all command-line options,
limitations, and known bugs. The overview below will help you
get started using the system, but we recommend that you read
all the man pages as well.
The pcount device driver must be installed prior to data
collection. The device driver acts as an interface between the
Alpha performance counter hardware and the daemon process. It services
interrupts from the performance counters, and on each interrupt
records the process id and program counter value for the interrupted
program. These program-counter samples are buffered in the kernel
until they are extracted by dcpid.
Determining Loadmap Information
The profiling system uses different sources of information to
determine which executable images are loaded in each process and
where they are loaded. On Tru64 UNIX, it uses a modified system
dynamic loader, a hook in the kernel exec path, system
process tables, and dcpiscan.
The loader and the exec hook are used continuously; the
system process tables are examined each time the daemon starts
running; and dcpiscan is
typically run once at setup and then infrequently when the images
on disk change. All four sources of information provide pathnames
for images; these are stored with profiles so that the analysis
tools can quickly find the images associated with each profile.
- A modified version of the system dynamic loader informs the
daemon whenever a dynamically linked program starts or loads
a shared library. The information provided by the loader to the
daemon includes the pathname of the image being loaded, the process
it is loaded into, and the address at which it is loaded.
- The pcount driver installs a hook in the kernel exec path
that captures information about all statically linked images.
- When the daemon process starts up, it scans all active processes
and their mapped regions to identify the images loaded in processes
that started before the daemon was started.
- The program dcpiscan is
used to scan filesystem directories for executables and build
a mapping from pathnames to executables. A default mapping for
common Tru64 UNIX executables is compiled into the system; dcpiscan is
usually run when the profiling system is installed to provide
more accurate identification of site-specific binaries. Since
the modified dynamic loader includes pathnames in the information
it provides to the daemon, dcpiscan is
mostly useful for obtaining pathnames of statically linked images.
The modified loader and the exec hook together ensure
that dcpid knows
about virtually all images loaded into each process.
Building a Profile Database
The dcpid daemon
extracts program-counter samples from the device driver and stores
them in an on-disk database. The database resides in a user-specified
directory, and may be shared by multiple machines. All samples
are organized into non-overlapping epochs, each of which
contains samples for some time interval. A new epoch is started
(and the previous epoch terminated) using the dcpiepoch command.
Each epoch occupies a separate directory; each epoch directory
contains subdirectories for each platform sharing the database.
(A platform typically corresponds to an individual host, but can
be configured using the file hosts in the top-level database
directory to correspond to a user-specified collection of machines.)
Each platform directory contains files with profile information,
typically one file per image. See the dcpiformat(4)man
page for details of the file format.
Samples are buffered in the device driver and in dcpid.
Buffered samples are flushed out periodically and also when an
epoch is terminated. To ensure consistent results, the analysis
tools should be run only on a completed epoch.
The dcpictl utility
can be used to control dcpid.
It provides commands to terminate an epoch and begin a new one;
to shut down monitoring; to flush all buffered samples to the on-disk
database; and to inform dcpid manually
about an image loaded into a process. (The latter is useful only
in unusual circumstances; see the man page for dcpictl for
During an epoch, samples are collected for all running images,
including all applications, shared libraries, and the kernel. There
are several ways to analyze the profile data for an epoch, from
a coarse-grained accounting for each image to a fine-grained analysis
of each instruction. Output from the tools ranges from a simple
prof-style listing of time spent in each image to basic-block flowgraphs
of each procedure annotated with information such as sample counts,
the average number of cycles taken by each instruction, and the
possible causes of stall cycles.
- At a coarse level of detail, dcpiprof shows
the time spent in any set of images active during an epoch. This
time can be broken down either by image or by procedure within
- At a fine level of detail, a basic-block flowgraph can be produced
for one or more procedures, showing a control-flow graph of the
machine instructions in each procedure annotated with sample
counts for each instruction, the source code associated with
each basic block, and an analysis of the number of stall cycles
for each instruction and the reasons for each stall.
- dcpicalc produces
a control-flow graph with the execution frequency of each
basic block, the average number of cycles taken by each
instruction, the possible reasons for each stalled cycle,
and a summary of how time was spent in the procedure.
- dcpisource annotates
a control-flow graph (produced by dcpicalc with
source code for each basic block.
- dcpi2ps takes
a control-flow graph from any of the tools above and produces
a postscript file for viewing or printing.
These tools are typically run in a pipeline, e.g.:
dcpicalc | dcpisource | dcpi2ps
(with appropriate flags and arguments -- see the man pages
and examples below for details).
- dcpilist lists
the contents of a procedure annotated with samples collecting
during profiling and with the average number of cycles required
to execute each line of code. The listing can contain either
machine instructions, or source lines, or both.
- dcpiwhatcg produces
a program-level (i.e., entire image, not just a single procedure)
summary breakdown of where time has been spent (percent of cycles
spent in, e.g., memory delays, static stalls, branch mispredicts,
and useful execution).
- dcpidiff and dcpistats compare
sets of profile data. dcpidiff compares
two sets of profiles for a single procedure, highlighting basic
blocks or source lines with the largest differences. dcpistats (currently
available only for Tru64 UNIX) compares multiple sets of raw
sample counts and prints various statistics about them; it is
useful for comparing variations across multiple runs of the same
program or for comparing differences between slightly different
versions of a program.
- dcpicat prints
the contents of one or more profile files in an ASCII format.
This is useful mostly to people debugging the Continuous Profiling
Several other utilities are provided. They are currently available
only for Tru64 UNIX.
- dcpi2pix produces pixie-format
output from the profile database, thus enabling existing tools
that take pixie-format input to be driven from the profile
- dcpix instruments
an executable to measure execution frequencies for basic blocks
and control flow edges directly. The output can be used by dcpicalc instead
of estimating frequencies from sample counts. (Note: the typical
mode of operation for this profiling system does not require
instrumenting executables; using dcpix to
instrument executables can be useful in rare circumstances when dcpicalc produces
poor estimates, or when evaluating the quality of the estimates
produces by dcpicalc.)
- dcpisumxct aggregates
execution counts measured using dcpix from multiple runs of an
- dcpicc compiles
C programs to produce object code that helps dcpisource in
identifying which source token each instruction corresponds to.
- dcpikdiff creates
a new image based on both vmunix and kmem(7) that
captures the true running kernel image after Tru64 UNIX dynamically
patches itself for the particular system it is running on.
- dcpiversion prints
the version number of the installed release. This is useful when
reporting bugs or other problems so that the developers of the
system know what version you are using.
- dcpiuninstall uninstalls
the profiling system, removing all binaries and man pages, and
replacing the system dynamic loader with the original version
(which was saved during the installation process). Note: dcpiuninstall does
not remove profile databases, nor does it remove the device driver
from the kernel.