Reusable Middleware Services

The proposed research advocates a vendor-neutral, reuse-driven approach to developing ERA on parallel/distributed platforms through composition from a set of large-grained middleware services (or building blocks) with precise specifications and well-defined interfaces. By identifying a collection of middleware services common to ERA, these building blocks can be reused to reduce the complexity and the cost of modifying and extending existing software base or developing evolutionary new systems. A layered open-architecture supports modular insertion of a new service or a new implementation as requirements evolve over the life-span of a system.

One of the novel features of our architecture is the identification of a modular collection of middleware services on OSF Mach-RT that can be used for developing ERA. The objective is to take full advantage of the base operating system and the real-time extensions including features such as preemptive microkernel, real-time scheduling framework, clock and timer support, and bounded IPC. These building blocks include services for managing computation and communication resources, for providing access to shared and/or replicated data in a networked environment, and for managing system dependability. The figure illustrates the various software layers in the suite of reusable middleware services. Each layer exports a well-defined interface to other services or to applications that are built on top of it. One or more protocols may be supported in each service layer. Related services are grouped together to illustrate functional dependencies among them.


Real-time unicast communication


Real-time multicast communication services

These services provide a collection of protocols with various delivery guarantees for sending messages to a group of destinations, including real-time (datagram) multicast, real-time atomic multicast, and atomic FIFO multicast. These services allow the exploitation of the underlying real-time channel protocol and hardware support in a system without exposing the implementation to the higher layer services.


Clock synchronization service

Clock synchronization provides access to synchronized clocks and logical timestamps. Clock synchronization service provides a bound on the deviation between processor clocks in the presence of hardware clock drifts and failures. By timestamping events in a distributed computations, `real-time' ordering and separation of events can be compared. The logical timestamping service can be used to assign timestamps that capture the causal ordering of events in a distributed computation.


Group membership and topology services

These services maintain the operation status of nodes and communication links within a system in the presence of processor failures/joins and communication failures. These functions are provided by three related services: heartbeat service, membership service, and topology management. The heartbeat service is used by each node to determine the operational status of other nodes. This service does not provide any global consistency or agreement among processors. This is a low-level service in the sense that it can use system-specific mechanisms to detect node and link failures. The topology management is a system-specific service that maintains information about the connectivity of a system. This service does not provide any global consistency or agreement. Changes in a topology are propagated throughout the system in a lazy fashion. The processor membership service uses the information provided by the heartbeat and topology management service to ensure that all functional nodes are reliably notified and have a consistent view of the operational status of the processors in the presence of processor failures/joins and communication failures. The processor membership service is used by other services to trigger initialization and recovery procedures.


Real-time data replication and caching services

These services provide the mechanisms for accessing and updating replicated/distributed application objects. The challenge is to develop replication schemes that provide the required redundancy without sacrificing the timing predictability of the system. An active replication scheme, based on the notion of periodic process groups, and a novel passive replication scheme, based on the notion of window consistency, will be provided as part of the middleware services. The caching service supports efficient access to frequently-accessed or slow-changing shared data. The real-time replication service ensures that an update to a data object is propagated to cached copies within bounded time. The implementation of the replication and caching services will exploit the bounded delivery guarantees provided by the underlying real-time channel protocol.


Failure detection and recovery management

These are closely related to replication services. While the replication service provides the mechanism for maintaining redundant copies of certain objects, an application-specific recovery procedure may have to be invoked when a failure is detected. The mechanism for failure detection and policies for recovery management are supported by this service.


Distributed synchronization services

These services provide fault-tolerant and scalable synchronization protocols for serializing access to shared/exclusive resources. These services include a distributed lock manager and a facility for barrier synchronization. The distributed synchronization services can recover from various types of failures including lock holder/coordinator crash and communication failure.


Resource tracking and migration services

provide the mechanisms for monitoring resources (or servers) in a distributed system and for properly supporting resource migration include invocation and reply forwarding, mechanisms for ensuring co-location of related resources and garbage collection. Resource migration may be necessary to deal with failures or to support load balancing/sharing in the presence of potential system bottlenecks.


Back to the project homepage.