The DCPI system consists of a set of tools that provides
low-overhead continuous profiling of all executables, including
the kernel. It is based on periodic sampling using the Alpha
performance counter hardware. Profiles containing samples for
each executable image (including shared libraries) are stored in
a user-specified directory.
Tools are provided to display profiles and produce a breakdown
of all cpu time by image, and by procedure within images. In
addition, detailed information can be produced showing the time
spent executing each source line and each instruction in a
Support is provided for some automated analysis for Alpha
21064/EV4 and Alpha 21164/EV5 based systems, including the
presentation of possible reasons for static and dynamic stalls.
For more information, see the DCPI home page
- Scans filesystem directories to find executables and
associate executables with filesystem pathnames. If you have
significant executables in unusual directories, you should
create a map of those image pathnames.
- Continuous profiling daemon. Extracts raw samples from
kernel device driver, associates them with executable images,
and stores them in profiles on disk.
- Displays profile data collected by dcpid. Produces
a breakdown of cpu time by image, or by procedures within
- Lists the contents of a procedure and annotates the listing
with samples collected during profiling via dcpid. The
listing can contain either source lines, or machine
instructions, or both. The listing is annotated with the samples
collected during profiling. When possible, the average number of
cycles required to execute each instruction or source line is
- Produces a sorted list of the instructions (and their
source line numbers) accounting for the greatest number of
samples of a specified event type.
- Compares multiple sets of raw sample counts and prints
various statistics about them. Dcpistats is useful for comparing
variations across multiple runs of the same program, or for
comparing differences between slightly different versions of a
- Converts DCPI profile data to a profile feedback file which
is stored in a given executable. This can be used for
compilation with feedback or by post-link optimizers like
- Controls the operation of dcpid. This subsumes
dcpiepoch, dcpiflush, and dcpiquit (which
are still provided for backward compatibility). Includes the
ability to notify the daemon about specific images loaded into
processes when necessary (e.g., when an image is loaded via
- Starts a new profiling epoch. All samples are associated
with a time interval called an epoch. The analysis tools
typically operate on a set of profiles from a single epoch.
- Flushes all unsaved samples from dcpid to profiles
- Terminates the dcpid daemon, flushing all unsaved
samples to disk.
- Bundles up a profile database directory and associated hot
images for examination on a different system.
- Prints the version string and creation date of the
installed DCPI release.
- Uninstalls DCPI binaries, libraries and man pages.
- Produces a sorted list of the instructions accounting for
the most stall cycles. This analysis tool works only on Alpha
21064/EV4 and 21164/EV5 systems.
- Annotates each instruction in a procedure's basic-block
graph with the average number of cycles for that instruction,
and computes the overall average cycle-per-instruction for that
procedure. This analysis tool works only on Alpha 21064/EV4 and
- Produces, for one or more images, a summary breakdown of
where time has been spent (percent of cycles spent in, e.g.,
memory delays, static stalls, branch mispredicts, and useful
execution). This analysis tool works only on Alpha 21064/EV4 and
- Measures execution counts for basic blocks and control-flow
edges directly; produces output which can be used by
stall-analysis tools (dcpicalc, dcpiwhatcg,
dcpitopstalls) to produce more accurate information.
Without output from dcpix, these tools estimate
- Aggregates execution counts measured using dcpix
from multiple runs of an instrumented program. This makes it
possible for stall analysis tools to analyze counts from
multiple runs of a program.
- Compares two sets of profiles for a procedure, highlighting
basic blocks or source lines with the largest differences. This
analysis tool works only on Alpha 21064/EV4 and 21164/EV5
- Augments a basic-block graph generated by
dcpicalc(1) with source code.
- Compiles C programs to produce object code that helps
dcpisource in identifying which source token each
instruction corresponds to.
- Formats a basic-block graph into Postscript.
- Converts DCPI profile data to pixie format.
- Creates a new image based on both vmunix and kmem(7) that
captures the true running kernel image after HP Tru64 Unix
dynamically patches itself using self-modifying code.
- Prints the contents of one or more profile files in an
- Generates a basic-block graph for a procedure annotated
with samples collected during profiling via dcpid. The
functionality of this program has been subsumed by dcpicalc
for Alpha 21064/EV4 and 21164/EV5.
Installation and Setup
See the README file from the kit, or the DCPI installation
http://h30097.www3.hp.com/dcpi/installation.html for details
of how to install the device driver, binaries, and man pages for
the profiling system. Once the system is installed,
dcpiscan(1) should be run and a profile-database directories
should be created.
- dcpiscan directories > map.local
- Create an image map for site-specific executables and shared
libraries in the specified directories and their descendants.
Although this step is technically optional, creating a map of
local executables will allow dcpid to more accurately
identify binaries stored in site-specific directories.
Dcpiscan(1) should be executed once during system installation,
and need only be re-executed to scan other directories or
- mkdir db
- Make a directory to store profiles.
The profiles written in the directory are owned by the
user who invokes dcpid, so the directory must be
writable by that user. If the directory is shared across
hosts, its permission should be set appropriately to allow
write-sharing by the users running dcpid. The
directory must also be in a partition with a reasonable
amount of available space (20 MB or so should be more than
After installation and setup are complete, data is collected
by running dcpid and the tools that control it:
- dcpid -m map.local db
- Start the dcpid process. (If dcpid is not
installed setuid-root, then this command must be run as root).
The optional argument -m map.local should contain a
mapping from executables to pathnames previously produced by
dcpiscan(1). db is the database directory created
during setup above.
- Terminate the current epoch and start a new one, ensuring
that profiles for the terminated epoch are flushed to disk and
- Flush buffered samples in the current epoch to the on-disk
database. Note that this is typically not necessary (unless you
want to see profiles immediately after running a program).
Buffered samples are flushed to disk whenever an epoch is
terminated and when dcpid is terminated. In addition,
buffered samples are flushed periodically (at intervals that can
be controlled with command-line arguments to dcpid).
- Terminate the current epoch, flushing all buffered samples
to disk, and exit dcpid. This turns off all
performance-counter interrupts and frees all memory used by the
As dcpid runs, it creates subdirectories of db,
one for each epoch. Each epoch directory further contains
subdirectories, one for each platform sharing the same db.
The platform names default to the local hostname on each machine
running dcpid, so by default profiles collected on
different machines are stored separately (though their epochs
are synchronized). However, the file hosts in directory
db may also be edited to contain a mapping from hostnames
to arbitrary platform names, allowing samples from several hosts
to be aggregated in the same profile database.
After an epoch is terminated, the profile data for the epoch
can be analyzed using a number of tools. By default, the
analysis tools find the relevant profile files automatically.
There are also a number of options that can be used to guide the
search for profile files when the default rules are not
appropriate; see the man pages for the individual tools for
- setenv DCPIDB db
- Set an environment variable that tells the downstream tools
where the profile database is located.
- Use dcpiprof to analyze the breakdown of cpu time
across all executables that ran during the epoch, broken down by
- dcpiprof image
- Use dcpiprof to analyze the breakdown of cpu time across all
procedures in the image file image.
- dcpitopcounts image
- Identify the hot spots in image; listing the
expensive instructions in order along with their source line
numbers if available.
- dcpilist -asm proc image
- Disassemble procedure proc in the image file image,
and annotate the disassembly with samples extracted from the
profile database and the average cycle time required for
executing each instruction.
- dcpilist -source proc image
- Generate a source code listing of procedure proc in
the image file image, and annotate the listing with
samples extracted from the profile database, and the average
cycle time required for executing each source line.
- dcpicalc proc image | dcpisource -f
image.c | dcpi2ps -o proc.ps
- Produce a basic-block graph for procedure proc in
image file image; then augment the graph with source
lines from image.c, calculate the cycle per instruction
for each instruction, and store the resulting Postscript in
proc.ps. The dcpicalc analysis tool works only on
Alpha 21064/EV4 and 21164/EV5 systems.
Installation instructions are in the kit README file, or at
the DCPI installation page
For processes that use the exec() system call (or its
variants), PC samples are sometimes charged to the wrong image.
Thus, it is possible to get samples for unexecuted instructions.
Specifically, the problem is that samples gathered prior to an
exec() call may be charged to an image that is running after
exec() returns. This problem is not serious in practice for the
common case of processes that call exec() once soon after being
created: since there are only a few samples gathered prior to
the exec(), only a few samples can be charged to the wrong
For more information, see the DCPI project home page
1996-2004, Hewlett-Packard Company. All rights reserved.