Synchronization of Fault-Tolerant Real-Time Multicomputers

Research Fellow: Alan Olson

Faculty: Kang G. Shin

Sponsor: Martin Marietta

The nodes of a multicomputer tell time by reading their respective clocks. Unfortunately, no two clocks run at precisely the same rate, so even if they start with the same time, eventually they will drift apart. This drift must be counteracted in order to make sure a task which meets its deadline on one node will meet its deadline on all nodes. Each node must make the effort to synchronize its clock with the rest of the clocks in the system.

The goal of this research is the development of effective, low cost, synchronization algorithms for multicomputer systems. The system's communication network is used to pass synchronization information, and probabilistic methods are used to compensate for the uncertainty in communications delays in the network. Fault-tolerance must not be compromised. Specific techniques under study include the following:

1. Adding more information and timestamps to a single synchronization message so that a single message can help to synchronize multiple nodes. This reduces the number of synchronization messages needed.

2. Sending single synchronization messages at short, regular intervals rather than a burst of synchronization messages at long intervals. This evens out the network load and allows nodes to carefully monitor their skew with respect to the rest of the system.

3. Defining overlapping groups of nodes, with each node belonging to at least one group. Synchronization within groups is tight, while a looser synchronization prevails across the system. Closely cooperating nodes are placed in the same group. This reduces the number of synchronization messages needed.

4. Reducing the effective drift rate of clocks by estimating their actual drift rate and compensating. This reduces the rate at which clocks drift apart, and thus reduces the amount of work the synchronization algorithm has to do.

These techniques may be used individually or can be combined for greater effect.