Time-bounded Restoration of Real-time Communication Services
Table of Contents:
- Principal Investigator.
- Productivity Measures.
- Summary of Objectives and Approach.
- Detailed Summary of Technical Progress.
- Transitions and DOD Interactions.
- Software and Hardware Prototypes.
- List of Publications.
- Invited and Contributed Presentations.
- Honors, Prizes or Awards Received.
- Project Personnel Promotions.
- Project Staff.
- URLs.
- Keywords.
- Business Office.
- Expenditures.
- Students.
- Book Plans.
- Sabbatical Plans.
- Related Research.
- History.
Principal Investigator.
- PI Name: Kang G. Shin
- PI Institution: University of Michigan
- PI Phone Number: 313-763-0391
- PI Fax Number: 313-763-4617
- PI Street Address: 1301 Beal Avenue
- PI City,State,Zip: Ann Arbor, MI, 48109-2122
- PI E-mail Address: kgshin@eecs.umich.edu
- PI URL Home Page: http://www.eecs.umich.edu/~kgshin
- Grant Title: Time-bounded Restoration of Real-time Communication Services
- Grant/Contract Number: N00014-94-1-0229
- Mipr Number: N/A
- R&T Number: N/A
- Period of Performance: 1/1/96 - 12/31/98
- Today's Date: 11/17/97
Productivity Measures.
- Number of refereed papers submitted not yet published: 15
- Number of refereed papers published: 30
- Number of unrefereed reports and articles: 4
- Number of books or parts thereof submitted but not published: 0
- Number of books or parts thereof published: 1
- Number of project presentations: 10
- Number of patents filed but not yet granted: 0
- Number of patents granted and software copyrights: 0
- Number of graduate students supported >= 25% of full time: 5
- Number of post-docs supported >= 25% of full time: 0
- Number of minorities supported: 0
Summary of Objectives and Approach.
- Objective: Develop dependable real-time
communication protocols that provide firm QoS guarantees on
end-to-end message delay, delay jitter, and bandwidth, and
restore the QoS-guaranteed communication service
disrupted by (persistent) network failures
within a pre-specified time bound
in multi-hop point-to-point networks.
- Approach:
- Real-time channel service paradigm:
Deterministic per-connection QoS guarantees are
achieved by resource reservation, admission control,
traffic enforcement, and CPU/link scheduling.
- Primary-backup channel scheme:
Backup channels are set up a priori for each primary channel.
The resource overhead for setting up backup channels is
minimized by a resource sharing and overbooking technique.
- Efficient failure detection:
Two behavior-based mechanisms ensure
fast and perfect detection of real-time channel failures.
- Fast/Robust failure handling:
Failure reporting, channel switching, and resource reconfiguration after
failure recovery are performed in a timely manner without affecting
regular traffic.
Detailed Summary of Technical Progress.
- We have designed and implemented a real-time channel protocol
which can provide a deterministic guarantee on such performance-QoS
parameters as end-to-end message delay, delay jitter, and bandwidth,
in packet-switched multi-hop networks.
The QoS guarantee is maintained on a per-connection basis, and
the well-behaving connections are protected from
unexpected overload conditions.
To achieve a high network-wide throughput,
our protocol features a new "process-per-channel" protocol
model that associates a channel handler with
each established channel.
Also, exported is a rich, well-defined API
(Application Program Interface)
through which applications can specify and negotiate QoS parameters.
- D. Kandlur, K. G. Shin, and D. Ferrari,
"Real-time communication in multi-hop networks,"
IEEE Trans. on Parallel and Distributed Systems, October 1994
-
A. Mehra, A. Indiresan, and Kang G. Shin,
"Design and Evaluation of a QoS-Sensitive Communication Subsystem Architecture,"
Technical Report CSE-TR-280-96, University of Michigan, January 1996.
-
A. Mehra, A. Indiresan, and Kang G. Shin,
"Structuring Communication Software for Quality-of-Service Guarantees,"
Proc. of IEEE Real-Time Systems Symposium, December 1996.
-
A. Mehra, A. Indiresan, and Kang G. Shin,
"Resource Management for Real-Time Communication: Making Theory Meet Practice,"
Proc. of IEEE Real-Time Technology and Applications Symposium, June 1996.
- While the real-time channel protocol supports
performance-QoS guarantees, we need to reduce or bound
the disruption time of performance QoS-guaranteed
communication services upon network failure.
To quickly restore a real-time connection from the failure, we set up
backup channels in addition to a primary channel for each connection.
A backup channel remains as a cold-standby and
does not carry any data until it is activated,
so that it does not consume bandwidth under a normal condition.
However, a backup channel is not free, as it requires
the same amount of resources as its primary channel to be reserved,
for immediate activation upon failure of the primary.
we have developed a resource sharing technique to minimize the
resources reserved for the fault-tolerance purpose while
the dependability is not compromised.
Two important dependability QoS parameters are guaranteed:
recovery delay bound, and probability that the application
does not suffer longer service disruption than
that recovery delay bound.
- The first step in handling a failure is its detection.
Behavior-based failure-detection schemes without hardware support
suffice for datagram network applications,
because they do not mandate fast failure recovery and
reliable message delivery can be acheived through
the acknowlegement/retransmission method.
However, effective failure detection with high coverage and low latency
is a key in providing fast/time-bounded failure recovery of real-time
channels. We have developed two behavior-based (low overhead) failure
detection schemes, and have evaluated their performance by
extensive fault-injection experiments on a laborotary testbed.
To this end, we used an integrated fault-injection
enviroment, called DOCTOR, which provides a complete set of tools
for automated fault-injection experiments in a
distributed environment.
- Reporting detected failures to the end nodes of the affected channels
and switching failed primary channels to their healthy backups
should be done quickly, and, at the same time, these operations must be robust
so that the operation of healthy connections can be insulated from
the recovery process for failed connections.
For time-bounded and robust transmission of time-critical
control messages (e.g., failure report messages),
we use speial-purpose real-time channels.
After a failed primary channel is replaced by one
of its healthy backups,
the resources for the primary channel need to be released and a
new backup channel should be established to maintain the
connection's dependability QoS.
Research on efficient QoS control is currently underway.
Transitions and DOD Interactions.
- Real-time channel software to Honeywell Advanced Technology Center
for various tracking applications.
- DOCTOR (fault-injection tool set) to Lockheed Martin in Denver.
- On-going discussions on the possibility of porting the fault-injection
tool and fault-tolerant communication software to the
testbeds at NAWC, NSWC, SEI, and Allied Signal.
Software and Hardware Prototypes.
- Prototype Name: Real-Time Channel Protocol
- Type: Software
- URL: Not available yet
- Availability: Yes
- Description:
A protocol for real-time channel establishment/tear-down and
run-time message transmission for the delay-guararanteed communication service.
This protocol is implemented on a modified version of x-kernel 3.1, which
runs on VME-bus based multiprocessor systems.
- Demonstration Examples: See the related publication.
- Prototype Name: Failure Detection Protocol
- Type: Software
- URL: Not available yet
- Availability: Yes
- Description:
Two heartbeat-based implementations for failure detection.
Both are implemented on the same platform as the real-time channel protocol.
- Demonstration Examples: See the related publication.
- Prototype Name: Backup Channel Protocol
- Type: Software
- URL: Not available yet
- Availability: Not yet
- Description:
The functionalities include backup-route selection, resource
reservation on the selected backup routes,
dependability QoS management, failure reporting, and channel switching.
This protocol will be implemented on top of
the real-time channel protocol and failure detection protocol.
- Demonstration Examples: Not available yet
List of Publications.
- Real-time channel service:
A. Mehra, A. Indiresan, and Kang G. Shin,
"Structuring Communication Software for Quality-of-Service Guarantees,"
Proc. of IEEE Real-Time Systems Symposium, December 1996.
- Fast failure recovery:
S. Han and K. G. Shin,
"Fast Restoration of Real-Time Communication Service from Component
Failures in Multi-hop Networks,"
Proc. of ACM SIGCOMM Symposium, September 1997.
- Efficient failure detection:
S. Han and K. G. Shin,
"Experimental Evaluation of Failure-Detection Schemes in Real-time Commun
ication Networks,"
Proc. of IEEE International Symposium on Fault-Tolerant Computing, June 1997.
Invited and Contributed Presentations.
- Real-time channel service:
"Structuring Communication Software for Quality-of-Service Guarantees,"
at IEEE Real-Time Systems Symposium, December 1996.
- Fast failure recovery:
"Fast Restoration of Real-Time Communication Service from Component
Failures in Multi-hop Networks,"
at ACM SIGCOMM, September 1997.
- Efficient failure detection:
"Experimental Evaluation of Failure-Detection Schemes in Real-time Commun
ication Networks,"
at IEEE International Symposium on Fault-Tolerant Computing,
June 1997.
Honors, Prizes or Awards Received.
Project Staff.
URLs.
- Annual Report FY97
- QUAD FY97 (power-point format)
- Vugraph_SIGCOMM97 (postscript)
Keywords.
- Real-time communication
- Failure recovery
- Multi-hop network
Business Office.
- Business Office Phone Number: 313-763-6438
- Business Office Fax Number: 313-763-4053
- Business Office Email: neilgerl@umich.edu
Expenditures.
- FY97: 64%
Current and Former Students.
- Name: Dr Ashish Mehra
- Homepage
- Position: Research Assistent
- Nationality: India
- Task: Real-time channel protocol
- Thesis: Structuring Host Communication Software
for Quality of Service Guarantees
- Graduated: 1997, PhD
- Job: IBM co.
- Name: Dr Atri Indiresan
- Homepage
- Position: Research Assistent
- Nationality: India
- Task: Real-time channel protocol
- Thesis: Exploring Quality of Service Issues
in Network Interface Design
- Graduated: 1997, PhD
- Job: Cisco Systems Inc.
- Name: Mr Harold Rosenberg
- Homepage
- Position: Research Assistent
- Nationality: USA
- Task: Fault injection, failure detection
- Thesis: Under preparation
- Graduated:
- Job:
- Name: Mr Seungjae Han
- Homepage
- Position: Research Assistent
- Nationality: Korea
- Task: Failure recovery, failure detection
- Thesis: Under preparation
- Graduated:
- Job:
- Name: Mr Charles Meissner
- Homepage
- Position: Research Assistent
- Nationality: USA
- Task: Fault injection
- Thesis: Under preparation
- Graduated:
- Job:
Book Plans.
Sabbatical Plans.
Related Research.
- HARTS project
- ARMADA project
- TENET project at Berkley
- IETF
- ATM Forum
History.