Software Tools for Performance and Dependability Evaluation

Graduate students: Seungjae Han and Harold Rosenberg

Research Scientist: C. V. Ravishankar

Faculty: Kang G. Shin

Sponsor: NASA, ONR, NSF

As real-time safety-critical applications and systems become more sophisticated, the task of testing and evaluating these systems has become increasingly more complex. As a result, there is a great need for software tools to assist programmers and system designers with performance and dependability evaluation. We are currently undertaking a number of research projects in the area of software tools for performance evaluation, dependability evaluation, and communication protocol testing. The tools being created as part of these projects provide us with an environment in which we can test and evaluate the hardware and software being developed for HARTS. The evaulation tools are: SWG, a synthetic workload generator; HMON, a real-time monitor; DOCTOR, an integrated dependability evaluation environment that includes a powerful software fault injection tool; and TEMPEST, a tool for automatically generating fault sets for fault injection experiments.

SWG, The Synthetic Workload Generator:

The performance and dependability of a computing system are directly affected by the structure and behavior of the workload it is executing. An ideal tool for experimental evaluation is a synthetic workload(SW) which is an executable model of an actual program. The characteristics of the SW is specified by an abstract-level workload specification language. The SWG compiles this high-level description of a workload to produce a synthetic workload which can be executed in a distributed manner. In this way the system may be evaluated under representative operating conditions or various specific user-controlled operating conditions. We also added a number of parameters in our model to describe the system-dependent features for generality.

HMON, A Real-Time Monitor:

To aid in debugging and to measure the performance of distributed real- time applications, we have developed a real-time monitor. Our monitor, called HMON, provides continuous and transparent monitoring activity throughout a real-time system's lifecycle with bounded, minimal, and predictable overhead, using purely software means. We have developed a novel approach to monitoring shared variable references that provides transparent monitoring with low overhead. The monitor is designed to support tasks such as debugging real-time applications, helping in real-time task scheduling, and measuring system performance. We have developed schemes for debugging distributed and parallel real-time programs by deterministic execution replay. In addition, HMON is capable of observing application-specific events which aids other usages such as the dependability data monitoring.

DOCTOR, An integrateD sOftware fault injeCTiOn enviRonment:

Fault-tolerance mechanisms generally perform a series of steps: fault detection, identification, isolation, recovery, and reconfiguration. In particular, the time needed in each of these fault processing steps greatly affects the dependability of real-time systems. The dependability is also highly dependent on the applications. By integrating software implemented fault-injection with our other tools, we are able to create a powerful environment for validating and evaluating system dependability. DOCTOR is capable of injecting various types of faults with a variety of options. Faults can be injected as many times as desired, with performance and dependability data automatically collected by HMON. It can use a usersupplied application, or can assist the user to generate synthetic workloads using SWG. A comprehensive graphical user interface is provided to help the user design and control fault-injection experiments. An important contribution of DOCTOR is its consideration of portability issues, an essential requirement to eliminate/reduce excessive duplication of effort and cost. As a result, fault-injection experiments can be performed during early design phase without developing a new fault injector for each target system.

TEMPEST, Testing and Evaluation eMPloying Event/State Transformations:

In order to automatically generate the fault sets for fault-injection-based experiments, it is necessary to formally specify the fault models and experiment parameters to be used. TEMPEST provides a graphical interface that allows an experimenter to formally specify fault models in terms of the effects of a fault on the system under test. These specifications are based on event/state transformations, which describe the effects of a fault in terms of the ways in which it can be activated, and the erroneous behaviors that it can cause. Based upon these specifications, together with specifications of the metrics and statistical goals for the experiment, TEMPEST automatically generates the fault sets for fault-injection experiments. These fault sets can then be used by a run-time fault-injection system, such as DOCTOR, to control the execution of fault-injection experiments. Because of the generality of its fault-model specification methodology, TEMPEST can be used as a front-end for any run-time fault-injection system.