dcpiupcalls - Experimental DCPI extension for user-level upcalls.
The DCPI infrastructure contains experimental support for performing
user-level upcalls that deliver profiling interrupts directly to user-level
handlers in profiled applications. Kernel support for upcalls is
automatically included with the DCPI driver kernel module. A preliminary
shared library named libuvprof.so that performs limited user-level
value profiling is also included with the DCPI release.
NOTE: Support for upcalls is not complete and has not been extensively
tested. It works on simple programs; but fails, for example, on programs
that use exceptions in signal handlers. In addition, the interface is
subject to change without regard to backwards compatibility in future
releases. Please read the caveats section carefully for more details about
A user application (or libuvprof on its behalf) registers with
the DCPI driver kernel module to receive upcalls. The driver uses a separate
minor device, /dev/pcount1, for processing upcall requests.
Other than open() and close(), the only supported
interface to the /dev/pcount1 upcall device is ioctl().
There are currently three ioctl operations, defined in the header
- Enables upcalls for the calling process. User-specified parameters are
passed in a pcount_upcall_ctl structure with the following
- The address of the upcall handler. During a performance counter
interrupt that results in an upcall, the kernel arranges for control to
be passed back to user code, starting at handler. Upon entry to
handler, register a0 is handler itself (so the
handler can easily perform a ldgp), register a1 is the
return PC from the performance counter interrupt, and register a2
contains the internal DCPI key indicating which performance counter
caused the interrupt.
In the current implementation, the user stack contains a
properly-aligned urti frame which preserves the original
values of registers at, sp, ps, pc,
gp, a0, a1, and a2 (see
pcount_upcall_frame). The handler must save and restore any
additional registers that it needs, and execute a call_pal
PAL_urti instruction to restore the urti frame
registers and return to the interrupted PC once upcall processing is
- Specifies the desired upcall frequency. An upcall is performed once
every freq times the calling process is interrupted.
- Specifies the address of a 32-bit user-space integer flag
indicating whether or not upcalls should be disabled. The kernel will
not perform any upcalls while this flag contains a non-zero value. The
kernel automatically sets this flag to 1 immediately prior to performing
an upcall, and the user-level upcall handler must explicitly reset it to
0 to re-enable upcalls. This approach is one efficient solution to the
problem of nested upcalls. Earlier schemes involved various user-level
locking mechanisms. If disable_flag is set to NULL, then no
checking is done, and it is the client's responsibility to cope with
potentially nested upcalls.
- Disable upcalls for the calling process. Note that upcalls are
automatically stopped when the process exits, but this operation would be
useful for stopping upcalls prior to exiting.
- Obtain various statistics related to upcall processing for the
calling process in a pcount_upcall_stats structure.
The libuvprof.so shared library for user-level value profiling
internally uses the kernel interface described above, and is intended to be
transparently loaded into the address space of a user application. An
internal DCPI interface is used to automatically send value profile samples
to the dcpid(1) daemon, as if they
had been collected in the kernel.
For example, the following uvrun shell script can be used to
start an application while automatically loading and starting value
# uvrun: Execute with user-mode value profiling.
# Set LD_LIBRARY_PATH appropriately for your system.
Since applications do not require any changes or modifications to be profiled,
a limited interface to libuvprof is currently provided through the
following environment variables:
- Specifies the maximum number of instructions to interpret during each
upcall. The default value is 8.
- Specifies the desired upcall frequency. An upcall is periodically
performed after the specified number of times the process is
interrupted. The default period uses the same value as specified for
- Specifies the type of value profiling to perform. The current
implementation supports both classic value-profiling (VPROF) and
replay trap detection value-profiling (VREPLAY). These options
mirror the -vprof and -vreplay options to
dcpid(1) for kernel-based value-profiling. Both options can be
specified. If neither is specified, the default is VPROF.
- Enables debugging output if defined. Currently logs some internal
state information at startup, and then periodically (e.g., every 256
upcalls) logs the range of interpreted PCs and other data for the
most recent upcall. Checks are performed to avoid logging when it
would be unsafe; e.g., it is unsafe to call printf() in the upcall
handler if the user application was interrupted while executing code
As an example, to perform transparent, user-level "classic" value
profiling for the application foo with verbose debugging output,
executing the following commands will generate the output similar to that
% setenv DCPI_UVPROF_DEBUG
% uvrun foo
uvprof: upcall_stack_top=3fffffc0620, upcall_handler_user=3fffffc4630
uvprof: libc start=3ff80080000, end=3ff8019e000
pc=[1200010c4 .. 1200010a8], key=0, count=256
pc=[1200010a4 .. 1200010c4], key=0, count=512
The value listed next to each interpreted PC is the register value captured
while executing the instruction. In this case, "classic" value profiling was
performed, in which the captured value is generally the result register for
the executed instruction. For example, the second set of debugging output
(for the 512th upcall) indicates that the value captured for the instruction
executed at PC 0x1200010bc was the value 0x87e.
In addition to the debugging output listed above, all collected value
samples are automatically send to the
dcpid(1) daemon to be aggregated and stored in the on-disk profile
Support for user-level upcalls is still experimental and incomplete. Known
- The current implementation of upcalls does not use the stack format
required for exception-handling code to properly handle urti-based
upcall frames. This can cause many programs, such as those that use signals,
to unexpectedly dump core. For example, emacs will typically run
for a few seconds to a few minutes, and then blow up. We know how to fix
this problem, and hope to do so for the next release. However, future
releases are not guaranteed to be backwards compatible with the existing
- The value-profiling interpreter compiled into libuvprof.so
does not contain support for most floating-point operations. Programs using
floating-point code will still execute correctly with user-level upcalls,
but no profiling samples will be collected for floating-point instructions.
We hope to include support for floating-point instruction interpretation in
a future release.
- Since several libuvprof.so data structures are global, only a
single active upcall is currently supported for each process, even for
- The default location of the libuvprof shared library.
- The default location of the kernel interface header file.
For more information, see the DCPI project home page
Hewlett-Packard Company. All rights reserved.