HP DCPI tool

»

DCPI

Site information

» Send us your comments

Installation

» Download DCPI
» Installing DCPI

Product information

» Frequently asked questions
» Documentation
» Publications
customer times newsletter link

dcpiupcalls(1)

NAME

dcpiupcalls - Experimental DCPI extension for user-level upcalls.

OVERVIEW

The DCPI infrastructure contains experimental support for performing user-level upcalls that deliver profiling interrupts directly to user-level handlers in profiled applications. Kernel support for upcalls is automatically included with the DCPI driver kernel module. A preliminary shared library named libuvprof.so that performs limited user-level value profiling is also included with the DCPI release.

NOTE: Support for upcalls is not complete and has not been extensively tested. It works on simple programs; but fails, for example, on programs that use exceptions in signal handlers. In addition, the interface is subject to change without regard to backwards compatibility in future releases. Please read the caveats section carefully for more details about known problems.

KERNEL INTERFACE

A user application (or libuvprof on its behalf) registers with the DCPI driver kernel module to receive upcalls. The driver uses a separate minor device, /dev/pcount1, for processing upcall requests.

Other than open() and close(), the only supported interface to the /dev/pcount1 upcall device is ioctl(). There are currently three ioctl operations, defined in the header file pcount_upcall.h:

PCOUNT_UPCALL_START
Enables upcalls for the calling process. User-specified parameters are passed in a pcount_upcall_ctl structure with the following elements:

handler
The address of the upcall handler. During a performance counter interrupt that results in an upcall, the kernel arranges for control to be passed back to user code, starting at handler. Upon entry to handler, register a0 is handler itself (so the handler can easily perform a ldgp), register a1 is the return PC from the performance counter interrupt, and register a2 contains the internal DCPI key indicating which performance counter caused the interrupt.

In the current implementation, the user stack contains a properly-aligned urti frame which preserves the original values of registers at, sp, ps, pc, gp, a0, a1, and a2 (see pcount_upcall_frame). The handler must save and restore any additional registers that it needs, and execute a call_pal PAL_urti instruction to restore the urti frame registers and return to the interrupted PC once upcall processing is complete.

freq
Specifies the desired upcall frequency. An upcall is performed once every freq times the calling process is interrupted.

disable_flag
Specifies the address of a 32-bit user-space integer flag indicating whether or not upcalls should be disabled. The kernel will not perform any upcalls while this flag contains a non-zero value. The kernel automatically sets this flag to 1 immediately prior to performing an upcall, and the user-level upcall handler must explicitly reset it to 0 to re-enable upcalls. This approach is one efficient solution to the problem of nested upcalls. Earlier schemes involved various user-level locking mechanisms. If disable_flag is set to NULL, then no checking is done, and it is the client's responsibility to cope with potentially nested upcalls.

PCOUNT_UPCALL_STOP
Disable upcalls for the calling process. Note that upcalls are automatically stopped when the process exits, but this operation would be useful for stopping upcalls prior to exiting.

PCOUNT_UPCALL_GET_STATS
Obtain various statistics related to upcall processing for the calling process in a pcount_upcall_stats structure.

LIBUVPROF INTERFACE

The libuvprof.so shared library for user-level value profiling internally uses the kernel interface described above, and is intended to be transparently loaded into the address space of a user application. An internal DCPI interface is used to automatically send value profile samples to the dcpid(1) daemon, as if they had been collected in the kernel.

For example, the following uvrun shell script can be used to start an application while automatically loading and starting value profiling:

    #!/bin/sh
    # uvrun: Execute with user-mode value profiling.
    # Set LD_LIBRARY_PATH appropriately for your system.
    LD_LIBRARY_PATH=/usr/lib/dcpi
    _RLD_LIST=DEFAULT:libuvprof.so
    export LD_LIBRARY_PATH
    export _RLD_LIST
    exec ${1+"$@"}
Since applications do not require any changes or modifications to be profiled, a limited interface to libuvprof is currently provided through the following environment variables:

DCPI_UVPROF_NINTERP
Specifies the maximum number of instructions to interpret during each upcall. The default value is 8.

DCPI_UVPROF_PERIOD
Specifies the desired upcall frequency. An upcall is periodically performed after the specified number of times the process is interrupted. The default period uses the same value as specified for NINTERP.

DCPI_UVPROF_VPROF
DCPI_UVPROF_VREPLAY
Specifies the type of value profiling to perform. The current implementation supports both classic value-profiling (VPROF) and replay trap detection value-profiling (VREPLAY). These options mirror the -vprof and -vreplay options to dcpid(1) for kernel-based value-profiling. Both options can be specified. If neither is specified, the default is VPROF.

DCPI_UVPROF_DEBUG
Enables debugging output if defined. Currently logs some internal state information at startup, and then periodically (e.g., every 256 upcalls) logs the range of interpreted PCs and other data for the most recent upcall. Checks are performed to avoid logging when it would be unsafe; e.g., it is unsafe to call printf() in the upcall handler if the user application was interrupted while executing code in libc.

LIBUVPROF EXAMPLE

As an example, to perform transparent, user-level "classic" value profiling for the application foo with verbose debugging output, executing the following commands will generate the output similar to that listed below:

    % setenv DCPI_UVPROF_DEBUG
    % uvrun foo
    uvprof: ninterpret=8
    uvprof: upcall_stack_top=3fffffc0620, upcall_handler_user=3fffffc4630
    uvprof: libc start=3ff80080000, end=3ff8019e000
    pc=[1200010c4 .. 1200010a8], key=0, count=256
      1200010e4 120001114
      120001114 1
      120001118 1
      120001108 d3adf
      12000110c 403f0393
      1200010a0 403f0393
      1200010a4 807e0726
    pc=[1200010a4 .. 1200010c4], key=0, count=512
      1200010a4 211ff15c
      1200010a8 e7df9984
      1200010ac 6571d329c
      1200010b0 2c61cc6244
      1200010b4 136ac96afdc
      1200010b8 87eb81ecf04
      1200010bc 87e
      1200010c0 b81ecf04
    ...
The value listed next to each interpreted PC is the register value captured while executing the instruction. In this case, "classic" value profiling was performed, in which the captured value is generally the result register for the executed instruction. For example, the second set of debugging output (for the 512th upcall) indicates that the value captured for the instruction executed at PC 0x1200010bc was the value 0x87e.

In addition to the debugging output listed above, all collected value samples are automatically send to the dcpid(1) daemon to be aggregated and stored in the on-disk profile database.

CAVEATS

Support for user-level upcalls is still experimental and incomplete. Known problems include:

  • The current implementation of upcalls does not use the stack format required for exception-handling code to properly handle urti-based upcall frames. This can cause many programs, such as those that use signals, to unexpectedly dump core. For example, emacs will typically run for a few seconds to a few minutes, and then blow up. We know how to fix this problem, and hope to do so for the next release. However, future releases are not guaranteed to be backwards compatible with the existing interfaces.
  • The value-profiling interpreter compiled into libuvprof.so does not contain support for most floating-point operations. Programs using floating-point code will still execute correctly with user-level upcalls, but no profiling samples will be collected for floating-point instructions. We hope to include support for floating-point instruction interpretation in a future release.
  • Since several libuvprof.so data structures are global, only a single active upcall is currently supported for each process, even for multi-threaded applications.

FILES

/usr/lib/dcpi/libuvprof.so
The default location of the libuvprof shared library.

/usr/include/dcpi/pcount_upcall.h
The default location of the kernel interface header file.

SEE ALSO

dcpi(1), dcpi2bb(1), dcpi2pix(1), dcpi2ps(1), dcpicalc(1), dcpicat(1), dcpicc(1), dcpicoverage(1), dcpictl(1), dcpid(1), dcpidiff(1), dcpidis(1), dcpiepoch(1), dcpiflow(1), dcpiflush(1), dcpikdiff(1), dcpilabel(1), dcpildlatency(1), dcpilist(1), dcpiprof(1), dcpiprofileme(1), dcpiquit(1), dcpiscan(1), dcpisource(1), dcpistats(1), dcpisumxct(1), dcpitar(1), dcpitopcounts(1), dcpitopstalls(1), dcpiuninstall(1), dcpivarg(1), dcpivcat(1), dcpiversion(1), dcpivlst(1), dcpivprofiler(1), dcpiwhatcg(1), dcpix(1), dcpiformat(4), dcpiexclusions(4)

For more information, see the DCPI project home page http://h30097.www3.hp.com/dcpi.

COPYRIGHT

Copyright 1996-2004, Hewlett-Packard Company. All rights reserved.