Enabling Real-Time Fault-Tolerant Applications
Objective
The increasing sophistication of embedded military systems
like the Advanced AWACS and Aegis-class cruisers
along with the increasing awareness of
cost economics make it essential to develop vendor-neutral
embedded real-time applications (ERA) software on
distributed/parallel computing platforms.
The goal of the proposed research is to develop and
demonstrate an environment -- an integrated set of techniques
and software tools -- for designing, implementing,
modifying, and integrating real-time distributed services
that are necessary to realize computation or I/O or communication
intensive ERA on parallel and/or distributed computing platforms.
The choice of distributed/parallel computing platforms
is to reflect the very nature of physical and
logical distribution of most defense applications.
Approach
The innovativeness of our research lies in the development of common
middleware services, their quick integration and reuse for different
ERA, and the tools needed for system integration and assessment.
Specifically, we will focus on the following research tasks.
Middleware services on a standard real-time operating system (RTOS):
We will identify and develop a collection of modular
and composable middleware services (or building blocks) for
constructing distributed/parallel ERA on a standard RTOS like Mach-RT
from the Open Software Foundation (OSF).
The building blocks include services for real-time task allocation
and resource scheduling, group communication, replication,
caching, distributed naming and membership.
We will also develop a communication subsystem architecture
based on real-time channels that can be used by the middleware
services to request end-to-end performance guarantees for communication.
Unified framework for timeliness and fault-tolerance
to address various aspects of timeliness and dependability
requirements of distributed ERA.
- Combined real-time I/O, communication, and computation with
deadline guarantees.
- Integrated fault-tolerance schemes: Fault-tolerance is
accomplished by a set of sequential steps: error
detection, fault location, system reconfiguration,
and restoration of contaminated computation to an
error-free state. All of these steps must be taken
with high coverage and the sum of their latencies must
be smaller than the recovery time
permitted by the application's timing constraints.
Software tools: An integrated toolset for specification, validation
and evaluation of the timeliness and
fault-tolerance capabilities of a target system is a key component of
this work. Tools to be developed and refined include fault injectors
at different software levels (such as the operating system,
communication protocol, and application levels), a synthetic real-time
workload generator, and a dependability/performance monitoring
and visualization tool. We will focus on portability,
flexibility, and usability of the toolset.
Development of a testbed and a demonstration system:
We will conduct a technology demonstration of the software services
and tools from this project on a representative embedded
application. Honeywell will define and develop an embedded real-time
application using the proposed real-time communication and middleware
services. This application will be demonstrated and evaluated on
a laboratory testbed at the Real-Time Computing Laboratory of
the University of Michigan, illustrating key functional, timing
and fault-tolerance characteristics of embedded military systems.
Goals
Design, development and integration of middleware
services on OSF Mach-RT microkernel:
- Identify and develop a collection of modular and composable
middleware services (or building blocks) for constructing
distributed/parallel embedded real-time applications on a
standard RTOS like OSF Mach-RT. The building blocks include
services for real-time task allocation and resource
scheduling, group communication, replication, caching,
distributed naming and membership.
- Develop algorithms for combined I/O--computation--communication
that are guaranteed to meet the application timing requirements.
- Develop algorithms and strategies for detecting and recovering
from component failures with bounded latencies and coverage.
Develop software tools for evaluation and validation of
performance and dependability requirements, including
- Fault injection tool for CPU, memory, and distributed protocols,
- Synthetic real-time workload generator, and
- Resource monitoring, analysis, and visualization tool
(based on Honeywell's SPI tool).
Requirements definition, design, development and
evaluation of a representative embedded real-time application:
- Define representative characteristics and requirements
by investigating a set of typical military ERA and
derive their representative characteristics
and requirements.
- Define a representative application, i.e., define an integrated
(end-to-end) hypothetical distributed real-time application
by using components and functions from several domains.
Start development of a demonstration system on a laboratory testbed
(Phase I) consisting of a network of Intel-processor PCs connected by
a FDDI ring and a single-hop ATM network. This demonstration system
will be running the middleware services and software tools developed
in this research on the OSF Mach-RT operating system.
Technology Transition
This
project brings together a team of researchers and developers from the
University of Michigan and Honeywell Technology Center with strong
technical background and proven track record in developing core
technologies of real-time systems and transferring them to commercial
systems. A key component of this project is a technology
demonstration of representative embedded applications which will
proceed in two major stages. In phase I, Honeywell Technology Center
will define and develop an embedded real-time application using the
proposed middleware services and tools. This application will be
demonstrated on the proposed laboratory testbed illustrating key
functional, timing and fault-tolerance characteristics of embedded
military systems. Honeywell and the University of Michigan team will
also conduct a joint study to evaluate these software services and
tools for a wide range of DoD embedded real-time applications.
In Phase II, Honeywell will lead a technology demonstration of
the results from Phase I in an industrial setting on an actual DoD
application. Demonstration of this technology in an industrial
setting is the logical next step in the transfer of this technology to
DoD laboratories and contractors. It is expected that a DoD
laboratory will be actively involved in the technology demonstration
By developing the proposed middleware services and tools on the OSF
Mach with real-time extensions which is POSIX-compliant, this project
builds on the significant advances made in the development of
commercial real-time operating systems during the last decade. The
focus on an open architecture and the ability to port the products
from this project to other POSIX-compliant platforms will make this
technology available to a wide community of embedded application
developers.
Back to the project
homepage