HP DCPI tool

»

DCPI

Site information

» Send us your comments

Installation

» Download DCPI
» Installing DCPI

Product information

» Frequently asked questions
» Documentation
» Publications
customer times newsletter link

dcpivprofiler(1)

NAME

dcpivprofiler - Value profilers

OVERVIEW

DCPI's value-profiling support allows you to specify what values to capture, how to process those values before merging them into the profile files, and how to format the values for printing. A value profiler is a dynamically loadable shared library written by the user. It encapsulates all the code needed to perform user-specified processing.

You can specify a value profiler to dcpid(1) and dcpilist(1) with the -vtrace command-line argument. dcpid calls the appropriate routine in the value profiler to determine what values should be captured and passes the necessary information to the driver. When dcpid receives the captured values from the driver, it calls another routine in the value profiler to process those values and merge the returned values into the profile database. dcpilist extracts values from the profile database and calls other routines in the value profiler to format the values for printing.

Two value profilers are included in the DCPI distribution: vp-classic.so collects the same information as the "classic value profiling" hardwired into DCPI; vp-addr.so collects the effective addresses of memory operands in load and store instructions.

The library and header files referred to here are installed by default into /usr/lib/dcpi and /usr/include/dcpi respectively.

INTERFACE

A value profiler must implement an interface consisting of the functions below. It should be linked with libtrace.a, distributed with DCPI and installed by default into /usr/lib/dcpi, to produce a shared library (with the -shared switch). The interface is defined in vprofiler.h. This and related header files are also in the DCPI distribution, and they are installed by default into /usr/include/dcpi.

To simplify the following discussion, we give these functions specific names here. In fact, the value profiler should have a dispatch table called _dispatch containing pointers to these functions. The function names are arbitrary.

vp_name
This returns the name of the value profile. The name is used internally by DCPI to locate profile files and has no other special meaning. Different value profilers must have different names, and the names "values", "values.replay", and "values.trace" are reserved. vp_name is called by both dcpid and dcpilist. Although vp_name may generate the name dynamically, obviously the same name should be returned so that dcpilist can find the profile files created by dcpid.

vp_init
This is called after the value profiler is loaded, for example to perform various initializations.

vp_release
This is called before the value profiler is unloaded, for example to perform various cleanup functions.

vp_prof_init
This returns the data structure specifying what values the driver should capture. This is called before value profiling is performed. See below for what values may be specified and how to construct the specification.

vp_prof_release
This is called after value profiling is done, for example to clean up data structures.

vp_prof_process
This takes a trace of values captured by the driver and produces an array of (pc, context, value) tuples. dcpid calls vp_prof_process with values coming from the driver and merges the tuples returned by vp_prof_process into the profile database.

The value trace from the driver consists of a series of entries, one for each instruction selected for profiling (see below for how to specify which instructions). Each entry contains the pc, the 32-bit instruction code, context values if specified by the -vcontext command-line argument of dcpid, and zero or more 64-bit values.

vp_prof_process is called with a parser object that helps in decoding the value trace from the driver. vp_prof_process may get general information with routines like trace_parser_get_pid and trace_parser_has_context. Most importantly, it may use trace_parser_next to go through and extract the information in the trace entries. See pcount/trace-parse.h for details on the parser interface.

vp_preface
This takes a 32-bit instruction code and returns a string that will be printed before the list of most frequent values ("hotlist") by dcpilist. If it returns NULL, the entire value list is omitted.

vp_format
This takes a 32-bit instruction code and a 64-bit value, and formats the value for printing by dcpilist. Naturally, it should return a string that represents the value in a way most useful to human users. For example, the result operand of a floating-point instruction should probably be printed as a floating-point number rather than a 64-bit hexadecimal integer.

It can be assumed that for each instruction with a non-empty value hotlist, dcpilist calls vp_preface exactly once and then vp_format as many times as necessary before moving on to the next instruction. Therefore, vp_preface can store information in static data structures for subsequent executions of vp_format to pick up. This helps to avoid the overhead of repeatedly parsing an instruction to figure out how the values in the hotlist should be formatted.

TRACE SPECIFICATIONS

You can specify what values to capture based on an instruction's 6-bit opcode. A vlist is the list of values that the driver should capture for all instructions having the same opcode. A trace specification is the set of vlists for all valid opcodes.

For a particular opcode, you may specify that the instruction be ignored. The instruction is still executed, of course, but the value trace from the driver will contain no record of it.

If the instruction is not ignored, the driver records some basic information: the pc, the 32-bit instruction code, and two context values if dcpid is called with the -vcontext command-line argument. Capturing this information typically requires only 8 bytes per instruction to be passed from the driver to dcpid because the data are encoded incrementally. However, to minimize overhead, you may still want to ignore instructions whose execution is of no interest at all, depending on the particular value-profiling application.

In addition, you may ask the driver to record zero or more values in the trace, up to seven in the current implementation. Possible values include

  • a specific integer or floating-point register (say, r17, or f4)
  • register operands Ra, Rb, and Rc
  • the effective address for any memory operand
  • load instruction latency

Of course, this list may grow as other useful values are identified. All values are captured after the instruction has been executed. Currently there is no check to determine whether the specified value makes sense for the instruction. For example, if Rc is specified for an instruction that does not have an Rc operand, the driver will capture some undetermined value without any warning.

Typically, a vlist is constructed by adding values to an initially empty vlist, and similarly a trace specification is constructed by adding vlists to an initially empty trace specification. The following routines can be used for this purpose. See trace-vlist.h for the function prototypes. This is only an experimental interface. It will be revised based on more usage experience.

trace_vlist_table_init(spec)
Initialize an empty trace specification (i.e., the driver keeps no record of any instruction execution).

trace_vlist_init(vlist)
Initialize a vlist that records only the basic information about the instruction, namely the pc, 32-bit instruction code, and context values (if dcpid is called with -vcontext).

trace_add_value_to_vlist(vlist, value)
Add value to vlist. vlist is not (yet) associated with any trace specification.

trace_add_vlist_by_opc(spec, opc, vlist)
Add the values in vlist to the vlist for instruction having the opcode opc.

trace_set_vlist_by_pick_value(spec, func)
The function func should take an opcode as an argument and return a value type. It is called for each valid opcode. The result that it returns is the only value that will be captured for instructions having that opcode.

trace_add_vlist_by_select(spec, selector, vlist
The function selector should take an opcode as an argument and return a boolean result. It is called for each valid opcode. The values in vlist are added to the vlist for that opcode if and only if selector returns true.

EXAMPLE

Here is a sample value profiler. It captures the effective addresses of the memory operands in all ldq and ldq_u instructions.


 #include <stdio.h>
 #include <stdlib.h>
 #include <machine/inst.h>
 #include <vprofiler.h>

 #define VSAMPLE_BUFFER_SIZE 1024
 static vsample_t vsample_buffer[VSAMPLE_BUFFER_SIZE];

 static int select_loads_stores(uchar opc, uchar func)
 {
     switch (opc) {
       case op_ldq:
       case op_ldq_u:
	 return 1;
     }
     return 0;
 }

 static trace_vlist_table_t* table;

 static trace_vlist_table_t* vp_prof_init(void)
 {
     trace_vlist_t* vlist;

     table = trace_vlist_table_alloc();
     trace_vlist_table_init(table);
     vlist = trace_vlist_alloc();
     trace_vlist_init(vlist);
     trace_add_value_to_vlist(vlist, TRACE_REGB);
     trace_add_vlist_by_select(table, select_loads_stores, vlist);

     free(vlist);
     return table;
 }

 static int vp_prof_process(uint pid,
			    trace_parser_t* parser,
			    vsample_t** vsamples)
 {
     int n = 0, nvalues, no_context;
     ulong pc, c0, c1, rb;
     union alpha_instruction inst;

     no_context = (! trace_parser_has_context(parser));
     while ((nvalues =
	     trace_parser_next(parser, (uint*) &inst, &pc, &c0, &c1, 1, &rb)) >= 0) {
	 if (nvalues == 1) {
	     if (no_context) {
		 c0 = c1 = 0;
	     }
	     vsample_buffer[n].pc = pc;
	     vsample_buffer[n].value = rb + (ulong) inst.m_format.memory_displacement;
	     vsample_buffer[n].context0 = c0;
	     vsample_buffer[n].context1 = c1;
	     n++;
	     if (n >= VSAMPLE_BUFFER_SIZE) {
		 break;
	     }
	 }
     }

     *vsamples = vsample_buffer;
     return n;
 }

 static const char* vp_name(void)
 {
     return "vp-ldq-addr";
 }

 static const char* vp_preface(uint inst)
 {
     return "addr";
 }

 static const char* vp_format(uint inst, ulong value)
 {
     static char buffer[32];

     sprintf(buffer, "%lx", value & ((1UL << 48) - 1));
     return buffer;
 }

 vp_dispatch_t _dispatch = {
   NULL,
   NULL,
   vp_prof_init, 
   NULL,
   vp_prof_process,
   vp_name,
   vp_preface,
   vp_format
 };

SEE ALSO

dcpi(1), dcpi2bb(1), dcpi2pix(1), dcpi2ps(1), dcpicalc(1), dcpicat(1), dcpicc(1), dcpicoverage(1), dcpictl(1), dcpid(1), dcpidiff(1), dcpidis(1), dcpiepoch(1), dcpiflow(1), dcpiflush(1), dcpikdiff(1), dcpilabel(1), dcpildlatency(1), dcpilist(1), dcpiprof(1), dcpiprofileme(1), dcpiquit(1), dcpiscan(1), dcpisource(1), dcpistats(1), dcpisumxct(1), dcpitar(1), dcpitopcounts(1), dcpitopstalls(1), dcpiuninstall(1), dcpiupcalls(1), dcpivarg(1), dcpivcat(1), dcpiversion(1), dcpivlst(1), dcpiwhatcg(1), dcpix(1), dcpiformat(4), dcpiexclusions(4)

For more information, see the DCPI project home page http://h30097.www3.hp.com/dcpi.

COPYRIGHT

Copyright 1996-2004, Hewlett-Packard Company. All rights reserved.