Testing and Fault Injection of Distributed Protocols

A growing challenge confronting designers and implementors of safety-critical distributed systems is the evaluation and validation of dependability requirements. This paper address the problem of testing fault-tolerance capabilities of distributed protocols. It introduces a general framework for fault injection and testing of distributed systems and it describes an ongoing development of a tool based on the framework. The tool can be inserted between any two layers of a protocol stack, and it can be used to inject faults into the system by observing and manipulating messages that are exchanged between the two layers. Existing approaches to fault injection often handle memory and CPU faults. Most current approaches for testing distributed protocols do not allow the manipulation of the protocol into specific states since they depend primarily on random testing to obtain certain coverage. This makes testing of distributed protocols difficult because some states in the protocol are hard to reach simply by probabilistically dropping or delaying packets, or by randomly testing execution paths. We are evolving toward a method which will make it easier for the tester to manipulate protocols into hard to reach states during a test. Other features of this tool include the support for both probabilistic and deterministic testing of distributed protocols, user-defined test scripts that can guide the analysis at run-time, and executable specifications that can emulate a participant in a distributed protocol.


Back to Publications list.

sdawson@engin.umich.edu