Real-Time Communication and Operating Systems

Graduate Students: Atri Indiresan, Ashish Mehra, Tarek Abdelzaher, Seungjae Han, and Lei Zhou

Faculty: Kang G. Shin

Sponsors: NSF and ONR

HARTS is a platform for distributed real-time applications and needs to provide distributed real-time services like clock synchronization, bounded latency for communication, and distributed deadline management. All communication and distributed time management services are provided by a network processor that executes protocols for real-time services and controls access to the communication medium. Services are provided as protocols running on our version of the x-kernel. The nature of services depends on the hardware capabilities.

Our research and development activity can be broadly classified as follows:

Distributed time services:

The accuracy of clock synchronization is limited by the variance in message latency, from the time a timestamp is written at a source node, till the timestamp is read at the destination node. We provide support for handling imestamps at the medium access layer of the protocol stack, and are exploring different clock synchronization schemes.

Real-time communication:

We support different classes of message communications. Deadline constrained communication is provided by real-time channels. It is not only important that real-time channels guarantee deadlines of their messages, but that they do not cause messages on other real-time channels to miss their deadlines, and also allow reasonable performance for best effort traffic. Message deadline guarantees are provided by resource reservation and run-time scheduling schemes.

We are exploring different strategies for different kinds of networks including point-to-point networks, crossbar switches, multi-access networks and wide-area networks. This is achieved by different resource allocation and run-time scheduling schemes. In the case of point-to-point and crossbar based networks, bandwidth is shared using a multi-class EDD scheme. These provide absolute guarantees on message latencies based on worst case real-time traffic arrival. Excess unused capacity is made available for best- effort traffic. In multi-access networks, LAN capacity is divided to support local and global real-time communication. A capacity reservation and corresponding token allocation scheme provides statistical performance guarantees based on average real-time traffic arrival.

Fault-tolerant Communication:

We are taking two different, but complementary, approaches to the issue of fault-tolerant real-time communication in distributed systems. The first approach seeks to improve the fault-tolerance of one-to-one real-time communications, while the second attempts to improve the overall system dependability by providing support for real-time communication among process groups.

The problem of one-to-one real-time communications has usually been dealt with under the implicit assumption that no network failure occurs in a real-time channel. The additional resources that are required for fault-handling were not budgeted in the reservation of resources needed to provide a network service with guaranteed performance. We adopt a layered approach by evaluating the performance of the fault- handling schemes of each layer in a protocol stack, and then develop an algorithm to find a fault-handling strategy, defined as a selection of the fault-handling schemes, such that the combined performance of the strategy will meet the service requirements of the real-time channel. We will also investigate the implications on the service negotiation protocols and the channel and network management protocols.

A widely-used paradigm for building distributed fault-tolerant systems is based on the notion of group communication among cooperating processes. In order to provide support for real-time group communication, we have chosen an approach that uses the resource reservation and scheduling theory developed for real-time channels to guarantee synchronous behavior for critical real-time messages. This will enable us to develop and implement group membership and multicast protocols with bounded delays. Two important related objectives of this effort are to define formally various real-time group multicasts and to derive more efficient multicast protocols by exploiting hardware support and network topologies.