Resource Management and Scheduling
Graduate Students : Atri Indiresan, Ashish Mehra, Thomas K. Tsukada, and Tarek Abdelzaher
Research Fellow: Ching-Chih Han
Faculty: Kang G. Shin
Sponsor: ONR
In real-time systems, resources are usually shared by many tasks.
Therefore, efficient and intelligent resource management and task
scheduling are essential to the success of a real-time system. Our
work can be divided into six approaches.
Need-based resource allocation:
In environments with limited processing power, processing resources
could be shared between subsystems governed by distinct management
policies. For example, resource management policies might differ for
non real-time and real-time tasks. This sharing of resources must be
provided in a manner that facilitates load balancing (both within and
across subsystems) and preserves the logical partitioning of the
different subsystems. Intelligent allocation of resources is needed to
ensure that the supply of resources to the workload matches the demand
placed on the resources by the workload. We are exploring the cost-
performance tradeoffs involved in resource allocation on the basis of
the needs of the workload at a given time. The workload essentially
determines the division of resources and this division of resources in
turn tracks changes in the workload. We are investigating the issues
and tradeoffs involved in applying need- based resource allocation to
different target domains, including a mix of real-time/ non real-time
tasks as well as the management of computation and communication for a
network-based shared memory multiprocessor.
Load sharing:
Effective load sharing can counter the problem of uneven task arrival
in a distributed real-time computing system. We are considering
various aspects of load sharing in heterogeneous distributed real-time
systems. Among the concepts we have been investigating are the use of
state-change broadcasts for load information exchange, the use of
preferred lists for task placement decisions, the use of Bayesian
analysis for the consideration of future task arrival behavior, and
the use of dynamic thresholds for task transfer decisions. We have
evaluated these ideas through simulation and implementation of
decentralized load sharing mechanisms.
Bandwidth reservation for real-time channels:
Since the timely communication between real-time tasks is essential to
the system, the network capacity must be allocated fairly, efficiently
and intelligently to ensure the performance guarantees of real-time
communication. We have developed a real-time communication scheme for
multiaccess links by using a link control unit to establish real-time
channels. A real-time channel of certain grade of quality may be
implemented by allocating sufficient link bandwidth to the source node
of the channel. By taking advantages of multiplexing independent
statistical channels, we have successfully reduced the link capacity
that needs to be reserved for a channel to the average level from the
worst case level without compromising performance guarantees. We then
expand our scheme to point-to-point networks by developing a
distribute route-selection strategy which is optimum in terms of the
connection establishment time and the success rate of channel
establishment requests.
Real-time channel admission control:
In order to provide performance guarantees, resources such as network
and CPU bandwidth and buffer space must be reserved for each channel.
Many system specification parameters such as scheduling strategies and
preemptability also affect the overall capability of supporting
real-time channels. Therefore, the establishment of a real-time
channel is subject to the approval of the system management.
We are exploring the impact of system characteristics on the
admissibility of real-time channels and determining desirable features
that help to use as much of the system's capacity as possible. We are
implementing a distributed network manager for route selection and
establishment of real-time channels. This distributed manager uses a
Bellman-Ford-like algorithm to find a route which satisfies
application requirements while considering hardware features, OS
capabilities, the current resource usage and the load balance of the
network.
Task and message scheduling:
In multicomputer systems, a sudden arrival of concurrent communication
traffic often leads to congestion. Our goal is to improve system
communication efficiency in such a situation by communication
bandwidth reduction and traffic sequencing. Bandwidth reduction is
achieved by the strategic placement of communicating modules to
explore the locality in task communication behavior. Traffic
sequencing is done by special routing, message or packet scheduling
methods, and, in the case of virtual channel network, flit
multiplexing schemes. Mathematical models for some specific real-time
traffic scheduling problems and fault-tolerant issues are also
formulated and analyzed.
Fault tolerance:
A method to avoid missing hard deadlines by trading the quality of
computation results for timing correctness. We associate each primary
periodic real-time task an alternate version which requires less
resource (hence reliable) but produces a less precise result. We
propose a scheduling algorithm which uses a fixed priority driven
preemptive scheme to pre-allocate time intervals to the alternates and
guarantees either the primary or alternate of each critical task to be
completed in time and gives priority to the the execution of
primaries.