HP DCPI tool



Site information

» Send us your comments


» Download DCPI
» Installing DCPI

Product information

» Frequently asked questions
» Documentation
» Publications
customer times newsletter link

DIGITAL Continuous Profiling Infrastructure

Lance Berc, Sanjay Ghemawat, Monika Henzinger
Shun-Tak Leung, Mitch Lichtenberg, Dick Sites
Mark Vandevoorde, Carl Waldspurger, Bill Weihl

Digital Systems Research Center
Palo Alto, CA 94301 USA

We have developed a profiling system, called the Digital Continuous Profiling Infrastructure, for Digital Alpha platforms that permits continuous profiling of entire systems, including the kernel, user programs, drivers, and shared libraries. A profile database is incrementally updated for every executable image that runs. A suite of profile analysis tools helps identify and interpret performance problems uncovered by profiling.

Our goal is to make the system efficient enough that it can be left running all the time, allowing it to be used to drive online profile-based optimizations for production systems. Our current prototype system is quite close to achieving this goal. In contrast to prior profiling systems, the Continuous Profiling Infrastructure has the following innovative features:

  • Efficiency: The Continuous Profiling Infrastructure has extremely low CPU overhead -- approximately 1-2%. Its memory and disk requirements are also modest. A typical profile consumes significantly less disk space than its corresponding executable image; a typical profile database consumes less than 10MB per week.
  • Transparency: Earlier profiling tools such as gprof, pixie, and atom required programs to be recompiled or modified for profiling. The Continuous Profiling Infrastructure works on unmodified executables, enabling profiling on production systems.
  • Completeness: The Continuous Profiling Infrastructure profiles entire workloads, not just single images, providing comprehensive coverage of overall system activity.
  • Accuracy: The Continuous Profiling Infrastructure reveals where all time is being spent down to the level of individual instructions, including time spent waiting for memory accesses. Most profiling systems simply count basic block executions.

The design of the Continuous Profiling Infrastructure contains several interesting features. We use the Alpha performance counters to sample program counter values. On a 10-processor SMP running at 400 MHz, we get about 6100 samples per second per processor, for an overall total of 61000 samples per second. A device driver services the interrupts, and a user-mode daemon extracts raw samples from the driver, associates them with executable images, and updates disk-based profiles. The driver uses hash tables to aggregate samples, reducing the amount of information that must be communicated to the user-space daemon by a factor of 10 to 100.

The profiling system produces sample counts for every instruction in every executable that is run. In addition to the core profiling infrastructure, we have implemented several utilities to analyze the sample counts:

  • dcpiprof generates prof-style output, producing a breakdown of the time spent by image, or by procedure within each image.
  • dcpicalc uses the sample counts to compute the average number of cycles taken by each individual instruction, showing which instructions have stalled and for how long.
  • Several other utilities produce basic-block flow graphs and annotate them with sample counts, cycles-per-instruction values, and source code.

We are currently working on other utilities to assign blame for wasted cycles, to determine which stalls result from memory waits, which from branch mispredicts, and so on. In addition, we are building utilities to map the profile information directly back to the source code, with the goal of identifying individual variable references as the source of memory-related stalls. Our ultimate goal is to use the detailed profile information generated by the Continuous Profiling Infrastructure to drive optimizations and transformations of programs, at both the source level and the object level.

Note: this page is a slight modification of an abstract presented at the 1996 OSDI Work-in-progress session.

[Profiling Home Page] -- -- [SRC]

Copyright 1996-2004, Hewlett-Packard Development Company, L.P..
Last modified: Monday, 05-May-97 10:27:31 PDT