Testing of Fault-Tolerant and Real-Time Distributed Systems via
Protocol Fault Injection
As software for distributed systems becomes more complex, ensuring
that a system meets its prescribed specification is a growing
challenge that confronts software developers. This is particularly
important for distributed applications with strict dependability and
timeliness constraints. This paper reports on
ORCHESTRA, a portable fault injection environment for
testing implementations of distributed protocols. This tool is based
on a simple yet powerful framework, called script-driven probing and
fault injection, for the evaluation and validation of the
fault-tolerance and timing characteristics of distributed protocols.
The tool, which was initially developed on the Real-Time Mach
operating system and later ported to other platforms including Solaris
and SunOS, has been used to conduct extensive experiments on several
protocol implementations. This paper describes the design and
implementation of our fault injection tool, focusing on architectural
features to support portability, minimizing intrusiveness on target
protocols, and explicit support for testing real-time systems. The
paper also describes the experimental evaluation of two protocol
implementations: a real-time audio-conferencing application on
Real-Time Mach, and a distributed group membership service on the Sun
Solaris operating system.
Back to
Publications list.
sdawson@engin.umich.edu