dcpildlatency - A DCPI value profiler for measuring load latencies.
DCPI's value-profiling infrastructure contains experimental support for
measuring the actual latencies experienced by loads in running programs. A
dcpivprofiler(1) value-profiling module named vp-ldlatency.so
is included with the DCPI release.
The DCPI value profiler includes an Alpha interpreter that fetches and
interprets a number of instructions starting with the interrupted PC. As
each instruction is interpreted, values of interest are captured and
recorded. When the interpreter encounters a load instruction, it executes
additional timing code to measure the elapsed time required to complete the
load. This timing code uses Alpha rpcc instructions and is
carefully structured to prevent unwanted out-of-order execution.
The raw latencies captured by the interpreter must be adjusted slightly to
account for extra cycles taken by the instructions used to perform the
timing and enforce ordering constraints. Also, although the latency value is
measured directly, there are still some sources of potential error, such as
cache interference from the interrupt handler and performance counter
To collect load latency data, start
dcpid(1) with the -vtrace option, specifying the
vp-ldlatency.so vprofiler module. Note that the absolute pathname must
be specified. For example:
% dcpid -vtrace /usr/lib/dcpi/vp-ldlatency.so db
This command will start dcpid with load latency value profiling. The
underlying value-profiling infrastructure will store a value hotlist
associated with the PC of each profiled load instruction. Each value hotlist
has a fixed size (currently 16 entries), and is updated using statistical
techniques that maintain the most frequently occurring values and their
Since individual load latencies will vary, it is sometimes desirable to
cluster raw latency values into histogram bins associated with levels
of the memory hierarchy. This can be accomplished by specifying an optional
latency-bins file name argument with the vp-ldlatency.so
module; note the need to quote the library and its argument together as a
single option string:
% dcpid -vtrace '/usr/lib/dcpi/vp-ldlatency.so latency-bins' db
The latency-bins argument names a text file containing mappings from
raw latency values into representative values and associated names. The file
format is very simple: blank lines and lines starting with the comment
character # are ignored. Each remaining line must contain four values
separated by white space: MIN, MAX, REP, and NAME.
This specifies that raw latency values (measured in processor cycles) in the
interval [MIN, MAX] should be mapped into the representative value
REP during data collection. The string NAME is used by tools that
report data values for analysis. Raw latency values not covered by any of
the specified intervals are not modified.
Note that the latency-bins file must be manually constructed with
the proper values (measured in processor cycles) for a particular machine
and memory system. As mentioned above, raw latency values need some
adjustments; raw values for the on-chip caches are typically too large due
to the cost of timing code, while raw values for slower memory levels are
typically too small, perhaps due to prefetching. Separate tools can be used
to automatically probe the cycle latencies associated with various levels of
the memory hierarchy (e.g., by repeatedly striding through carefully-sized
arrays), but no such tools are included with the current DCPI release.
The dcpilist(1) command can be
used to produce procedure listings annotated with load latency value profile
information collected using dcpid(1).
The same -vtrace option used with dcpid should be
specified to dcpilist. For example, the following command will
display the load latency values along with each sampled instruction for the
procedure procedure in the image binary:
% dcpilist -vtrace /usr/lib/dcpi/vp-ldlatency.so procedure image
Note that the values reported will be raw latency values if no
latency-bins file was used during data collection, or the
representative values if such a file was used. If the same latency-bins
file argument is used with dcpilist, the string names associated
with each bin will also be reported:
% dcpilist -vtrace '/usr/lib/dcpi/vp-ldlatency.so latency-bins' procedure image
Here is a sample latency bins file used with an Alpha 21164 workstation.
Note that the bins for main memory are somewhat arbitrary. To ensure that
all collected values are reported, no more than 16 bins (the current hotlist
size) should be used:
# Example Load Latency Bins
# Entry format:
# MIN MAX REP NAME
# Maps [MIN, MAX] => REP in profiles.
# NAME is used for reports (dcpilist).
# Miata EV56 @ 600MHz
# L1 (dcache)
0 10 2 D
# L2 (scache)
11 20 7 S
# L3 (bcache)
21 50 35 B
51 70 60 M1
71 90 80 M1
91 110 100 M1
111 130 120 M2
131 150 140 M2
151 170 160 M2
171 190 180 M2
191 210 200 M2
If the same latency-bins file is not specified for both
dcpid(1) and dcpilist(1),
the string names reported with values may be incorrect. However, it is OK to
use a latency-bins file during data collection with dcpid
while not using any file with dcpilist; in this case, no string
names will be reported.
For more information, see the DCPI project home page
Hewlett-Packard Company. All rights reserved.