Resource Management and Scheduling

Graduate Students : Atri Indiresan, Ashish Mehra, Thomas K. Tsukada, and Tarek Abdelzaher

Research Fellow: Ching-Chih Han

Faculty: Kang G. Shin

Sponsor: ONR

In real-time systems, resources are usually shared by many tasks. Therefore, efficient and intelligent resource management and task scheduling are essential to the success of a real-time system. Our work can be divided into six approaches.

Need-based resource allocation:

In environments with limited processing power, processing resources could be shared between subsystems governed by distinct management policies. For example, resource management policies might differ for non real-time and real-time tasks. This sharing of resources must be provided in a manner that facilitates load balancing (both within and across subsystems) and preserves the logical partitioning of the different subsystems. Intelligent allocation of resources is needed to ensure that the supply of resources to the workload matches the demand placed on the resources by the workload. We are exploring the cost- performance tradeoffs involved in resource allocation on the basis of the needs of the workload at a given time. The workload essentially determines the division of resources and this division of resources in turn tracks changes in the workload. We are investigating the issues and tradeoffs involved in applying need- based resource allocation to different target domains, including a mix of real-time/ non real-time tasks as well as the management of computation and communication for a network-based shared memory multiprocessor.

Load sharing:

Effective load sharing can counter the problem of uneven task arrival in a distributed real-time computing system. We are considering various aspects of load sharing in heterogeneous distributed real-time systems. Among the concepts we have been investigating are the use of state-change broadcasts for load information exchange, the use of preferred lists for task placement decisions, the use of Bayesian analysis for the consideration of future task arrival behavior, and the use of dynamic thresholds for task transfer decisions. We have evaluated these ideas through simulation and implementation of decentralized load sharing mechanisms.

Bandwidth reservation for real-time channels:

Since the timely communication between real-time tasks is essential to the system, the network capacity must be allocated fairly, efficiently and intelligently to ensure the performance guarantees of real-time communication. We have developed a real-time communication scheme for multiaccess links by using a link control unit to establish real-time channels. A real-time channel of certain grade of quality may be implemented by allocating sufficient link bandwidth to the source node of the channel. By taking advantages of multiplexing independent statistical channels, we have successfully reduced the link capacity that needs to be reserved for a channel to the average level from the worst case level without compromising performance guarantees. We then expand our scheme to point-to-point networks by developing a distribute route-selection strategy which is optimum in terms of the connection establishment time and the success rate of channel establishment requests.

Real-time channel admission control:

In order to provide performance guarantees, resources such as network and CPU bandwidth and buffer space must be reserved for each channel. Many system specification parameters such as scheduling strategies and preemptability also affect the overall capability of supporting real-time channels. Therefore, the establishment of a real-time channel is subject to the approval of the system management.

We are exploring the impact of system characteristics on the admissibility of real-time channels and determining desirable features that help to use as much of the system's capacity as possible. We are implementing a distributed network manager for route selection and establishment of real-time channels. This distributed manager uses a Bellman-Ford-like algorithm to find a route which satisfies application requirements while considering hardware features, OS capabilities, the current resource usage and the load balance of the network.

Task and message scheduling:

In multicomputer systems, a sudden arrival of concurrent communication traffic often leads to congestion. Our goal is to improve system communication efficiency in such a situation by communication bandwidth reduction and traffic sequencing. Bandwidth reduction is achieved by the strategic placement of communicating modules to explore the locality in task communication behavior. Traffic sequencing is done by special routing, message or packet scheduling methods, and, in the case of virtual channel network, flit multiplexing schemes. Mathematical models for some specific real-time traffic scheduling problems and fault-tolerant issues are also formulated and analyzed.

Fault tolerance:

A method to avoid missing hard deadlines by trading the quality of computation results for timing correctness. We associate each primary periodic real-time task an alternate version which requires less resource (hence reliable) but produces a less precise result. We propose a scheduling algorithm which uses a fixed priority driven preemptive scheme to pre-allocate time intervals to the alternates and guarantees either the primary or alternate of each critical task to be completed in time and gives priority to the the execution of primaries.