Intelligent Real-Time Fault-Tolerant Control and Computing

Graduate Students: Sushil Birla, Dave Ong,, Thomas Tsukada, Ella Atkins, and Eric Miller

Faculty: Kang G. Shin and Edmund Durfee

Sponsors: NSF, NASA, Martin Marietta, Texas Instruments, and General Motors

The use of autonomous computer systems for controlling sophisticated processes in computer-integrated manufacturing (CIM) and aerospace has become increasingly common. Characterizing the requirements for such systems is an important open problem. The Intelligent Real-Time Control Group is addressing this problem from artificial intelligence (AI), control theory, and other perspectives.

Research in real-time computing has developed proofs that a control system will meet the demands of the environment, but has not addressed the dynamic planning that is required by an agent operating in a dynamic environment. Unfortunately, the various artificial intelligence methods developed to deal with these types of problems are not suited to real-time guarantees, as they involve heuristic search in exponential search spaces. We argue that building an agent that achieves goals in changing environments requires blending real-time computing and AI technologies, and investigate this concept in the Cooperative Intelligent Real-time Control Architecture (CIRCA), designed to run both complex AI methods and guaranteed real-time control plans on separate subsystems.

Real-time control systems are generally composed of environments, controller computers and controlled processes, which interact synergistically. To characterize the requirements of controller computers, we model fault behaviors, such as fault occurrences and active/benign durations, especially induced externally due to a harsh operating environment, for example, electromagnetic interferences (EMI). When a failure occurs due to such fault(s) in the controller computer, we evaluate the Fault-Tolerant Latency (FTL), defined as the time required by all sequential steps in recovering from a controller-computer failure, while considering various fault-tolerance features. We also derive the hard deadline of a lineare time invariant control system, which formally specifies the need of the controlled process in a form that is understandable to the controller computer, by using the fault-behavior model, the computation-time delay/disturbance (measured by the FTL), the dynamic characteristics of the controlled process. Consequently, we develop a cost-effective design strategy for a fault-tolerant controller computer by optimizing the tradeoff between time and spatial redundancies and considering the hard deadline and the FTL.

Advances in distributed computing have made decentralization an important concept in computer- integrated manufacturing. Decentralization, however, poses the new problem of coordination, particularly in the case of recovery from disruptions. Flexible manufacturing systems routinely experience disruptions, from which they must be able to recover efficiently. In a decentralized cellular system, this requires intelligent run-time coordination, because constraints within the system may allow a disruption at one cell of the system to propagate to other cells. We have proposed an approach, called "polite replanning", by which cell controllers respond to local disruptions in a way least disruptive to other cells. We are investigating this approach and the issues of negotiation and coordination in the Polite Rescheduler for Intelligent Automated Manufacturing (PRIAM) architecture.

A framework is being developed for conceptual modeling of programmable servo-controlled manufacturing-equipment, useful in developing software for real-time monitoring and control, such that it is reusable, extensible, maintainable, and integratable. The potential impact is to reduce the net effort and time to develop related future applications through the approach of defining initial requirements in a way that facilitates evolution. Although researchers agree, in general, that such a focus offers very high payoff, they also agree that it is a very complex and difficult subject that will require long-range research. Our research utilizes concepts from software engineering, database design, artificial intelligence, and manufacturing engineering. The resulting process is a combination of object-oriented analysis, domain analysis, and knowledge engineering. The process includes the definition of the purpose of the conceptual model, bounding the domain, selecting a reference model architecture for intelligent machine control, and modeling key concepts in the form of candidate object-classes that characterize programmable servo- controlled manufacturing equipment.