HP DCPI tool

»

DCPI

Site information

» Send us your comments

Installation

» Download DCPI
» Installing DCPI

Product information

» Frequently asked questions
» Documentation
» Publications
customer times newsletter link

dcpiprofileme(1)

NAME

dcpiprofileme - Using DCPI to collect and view ProfileMe data

COLLECTING PROFILEME SAMPLES

On an Alpha 21264a/EV67 or later processor, tell dcpi to gather ProfileMe data via the command:

  dcpid -slot pm <profile db dir>
This causes dcpi to collect ProfileMe samples. The data for each sample is decomposed into named "bit" and "counter" values. Note that some alternate "counter" statistics can be gathered by specifying pm0, pm2, or pm3 instead of pm. See "COUNTER NAMES AND THEIR MEANINGS" below.

If no -slot option is present, the default on Alpha 21264a/EV67 and later processors is to multiplex between collecting traditional aggregate cycle samples and collecting ProfileMe (type pm) statistics.

BIT NAMES AND THEIR MEANINGS

retired
The instruction retired, i.e., it was not in the shadow of any trap. It may have caused a mispredict trap, though.

taken
The conditional branch was taken. This bit is UNDEFINED for samples for instructions other than conditional branches or for a conditional branch when it mispredicts.

cbrmispredict
The conditional branch was mispredicted. This bit is clear for instructions other than conditional branches.

valid
The instruction retired and didn't cause a trap.

nyp
Stands for "Not Yet Prefetched." Indicates that when the fetcher asked for the fetch block containing the instruction, the instruction was not in the icache and the prefetcher had not yet initiated an off-chip request for the instruction.

If nyp is set, the instruction's fetch block definitely caused an icache miss stall.

If nyp is clear, the instruction's fetch block may have still caused an icache miss stall: the prefetcher may have made an off-chip request for the instruction, but the instruction may not have arrived at the time the fetcher needed it.

ldstorder
Supposed to indicate that a replay trap was caused by one of the following:
  • load store order

    a younger load issuing before an older store to the same physical address

  • troll order

    a younger load issuing before an older store where the dcache indexes for the physical addresses match but the higher order address bits are different

  • simultaneous load and store

    a load and a store to the same physical address issuing simultaneously

In all three cases, the younger instruction causes a replay trap.

map_stall
The instruction stalled after it was fetched and before it was mapped. Such stalls are caused by a shortage of physical registers, integer issue queue space, floating-point issue queue space, or inums. There are 80 inums used to track instructions that are in flight.

early_kill
The instruction was killed early in the pipeline -- before it entered an issue queue.

late_kill
The instruction was killed late in the pipeline.

COUNTER NAMES AND THEIR MEANINGS

retdelay
A lower bound on the number of cycles that the instruction's inum delayed the advance of the retire pointer. Large values indicate a probable performance problem. E.g., the retdelay of the first instruction that uses the result of a load that misses out to memory might have a retdelay of 100. This statistic is gathered by default and/or when the -slot pm option is specified.

inflight
For instructions that retired without trapping (retired^notrap), this is approximately the number of cycles that the instruction was inflight. More precisely, it is -3 plus the number of cycles elapsed from when the instruction exited the fetch stage until the instruction retired. This statistic is gathered by default or when one of the -slot pm0, -slot pm, or -slot pm3 options is specified.

retires
For instructions that retired without trapping (retired^notrap), this is approximately the number of instructions that retired while the profiled instruction was inflight. This statistic is gathered when either the -slot pm0 or the -slot pm2 option is present.

bcmisses
For instructions that retired without trapping (retired^notrap), this is approximately the number of bcache misses that occurred while the profiled instruction was inflight. This statistic is gathered when the -slot pm2 option is specified.

replays
For instructions that retired without trapping (retired^notrap), this is approximately the number of replay traps that occurred while the profiled instruction was inflight. This statistic is gathered when the -slot pm3 option is used.

TRAP BIT NAMES AND THEIR MEANINGS

Exactly one trap bit is set in any given ProfileMe sample.

notrap
None of the below

mispredict
The instruction caused a JSR/RET/JMP/JMP_COROUTINE or conditional branch mispredict

replays
The instruction caused a replay trap.

unaligntrap
The instruction caused an unaligned load or store.

dtbmiss
The instruction caused a DTB single miss.

dtb2miss3
The instruction caused a DTB double miss. (3-level page tables)

dtb2miss4
The instruction caused a DTB double miss. (4-level page tables)

itbmiss
The instruction caused an Instruction TLB miss. Most other bit and counter values will be those for the first instruction in the ITB miss handler.

arithtrap
The instruction caused an arithmetic trap.

fpdisabledtrap
The instruction caused a floating point disabled trap.

MT_FPCRtrap

dfaulttrap
The instruction caused a Dstream fault because the virtual page is inaccessible or because the virtual address is malformed, i.e., not properly sign-extended.

iacvtrap
The instruction caused an istream access violation. Most other bit and counter values will be those for the first instruction in the IACV fault handler.

OPCDECtrap
The instruction caused an opcdec trap.

interrupt
The instruction was pre-empted by an interrupt. Most other bit and counter values will be those for the first instruction in the PAL code that handles interrupts.

mchktrap
Note: trap can be used as a synonym for \!notrap.

VIEWING PROFILEME DATA

Use dcpiprof(1) to find out how many samples with particular bit values landed in each image or procedure of a program. Use dcpilist(1) to find out how many landed on a particular source line or instruction.

The dcpi tools use the following syntax to name sets of samples:

 sample_set ::= bit_value
             | sample_set ^ bit_value
             | any

 bit_value ::= <Bit Name>
             | ! <Bit Name>
             | <Trap Bit Name>
             | ! <Trap Bit Name>
/ may be used instead of ! to indicate negation (since ! must usually be escaped on the command line).

Example sample sets:

retired^notrap
names all samples where the retired bit and the notrap bit are both set, i.e., samples where the instruction retired and didn't cause a trap.

taken^!mispredict
names all samples where the taken bit is set and the mispredict bit is clear.

Each bit_value is a constraint on the set of samples included in the set: if the bit_value contains `!', the set includes only samples whose value for the bit is 0. If the bit_value has no `!', the set includes only samples whose value for the bit is 1. The sample set contains all samples that satisfy the constraints. The special sample set any includes all samples.

A sample set may be used as an event-type to determine how many samples in the set come from a particular image, procedure, or instruction.

To view the counter data, one appends ":CounterName" to the end of a sample_set. This denotes the total of the counter's values over each sample in the set.

EXAMPLE USAGE

dcpiprof -sp retired:retdelay -pm retired+trap^\!dtbmiss
Lists, in descending order for all images, the total of the retire delay count for samples of instructions that retired, along with the number of samples for retired instructions and the number of samples in which the instruction trapped and the trap was not a dtbmiss.

dcpiprof -pm retired:retdelay a.out
Lists, by procedure, the total of the retire delay count for each sample of an instruction that retired.

dcpiprof -pm default+retired:retdelay::retired a.out
Lists, by procedure, the default information plus a column showing the average retire-delay per retired instruction in the procedure.

dcpiprof -event cycles -pm mispredict::retired a.out
Lists, by procedure, the total of the cycles count, along with the number of mispredicts per retired instruction.

dcpiprof -sp \!notrap -pm \!notrap a.out
Lists, in descending order for each procedure in a.out, the number of samples where an instruction in the procedure caused some kind of trap. (Note the use of `\' to prevent the shell from munging `!'. Note also that `/' can be used on the command line instead of `\!' to simplify typing.)

dcpilist -pm retired main a.out
Lists, for each instruction in procedure main of a.out, the number of samples where the instruction retired.

dcpilist -pm \!notrap main a.out
Lists the number of trap samples for each instruction in procedure main.

dcpilist -pm \!notrap+retired main a.out
Gets the data for the previous two examples with a single command. As shown above, dcpiprof also supports the use of + to display 2 or more sample sets with one command.

dcpilist -event cycles -pm trap+replays+ldstorder+mispredict main a.out
Lists the source lines (if source is available) or the instructions of procedure main showing the number of cycles samples and some trap detail for instructions which trapped.

LIMITATIONS

Because retdelay is merely a lower bound, there is no way to account for all cycles using only ProfileMe data. The retire delay always excludes stall cycles prior to when the profiled instruction was fetched. This makes it impossible to measure the length of icache miss stalls.

When a profiled instruction is killed early in the pipeline (early_kill is set), the PC reported by the hardware may be wrong and all counter values and bits other than valid, early_kill, no_trap, and and map_stall may be wrong.

Note that the unreliable data is restricted to instructions that were killed, and this data can be excluded by requiring \!early_kill.

The taken bit is UNDEFINED for instructions other than conditional branches or for conditional branches that mispredict.

SEE ALSO

dcpi(1), dcpi2bb(1), dcpi2pix(1), dcpi2ps(1), dcpicalc(1), dcpicat(1), dcpicc(1), dcpicoverage(1), dcpictl(1), dcpid(1), dcpidiff(1), dcpidis(1), dcpiepoch(1), dcpiflow(1), dcpiflush(1), dcpikdiff(1), dcpilabel(1), dcpildlatency(1), dcpilist(1), dcpiprof(1), dcpiquit(1), dcpiscan(1), dcpisource(1), dcpistats(1), dcpisumxct(1), dcpitar(1), dcpitopcounts(1), dcpitopstalls(1), dcpiuninstall(1), dcpiupcalls(1), dcpivarg(1), dcpivcat(1), dcpiversion(1), dcpivlst(1), dcpivprofiler(1), dcpiwhatcg(1), dcpix(1), dcpiformat(4), dcpiexclusions(4)

For more information, see the DCPI project home page http://h30097.www3.hp.com/dcpi.

COPYRIGHT

Copyright 1996-2004, Hewlett-Packard Company. All rights reserved.