1995


Ashish Mehra, Jennifer Rexford, Hock-Siong Ang, and Farnam Jahanian

Design and Evaluation of a Window-Consistent Replication Service

University of Michigan

Abstract: Real-time applications typically operate under strict timing and dependability constraints. Although traditional data replication protocols provide fault tolerance, real-time guarantees require bounded overhead for managing this redundancy. When time is scarce and the overhead for managing redundancy is too high, alternative approaches must balance the trade-off between timing predictability and fault tolerance. This paper presents the design and evaluation of a window-consistent primary-backup replication service that provides timely availability of the repository by relaxing the consistency of the replicated data. A client application registers an object with the service by declaring the consistency requirements for the data, in terms of a time window. The primary ensures that each backup site maintains a version of the object that was valid on the primary within the preceding time window. The primary delivers timely service to clients by decoupling the transmission of updates to the backup from the processing of client requests. Real-time scheduling of update messages can guarantee controlled inconsistency between the primary and backup repositories. This ensures that client applications interact with a window-consistent repository when the backup must supplant a failed primary. Our prototype implementation demonstrates the utility of the window-consistent replication model. Experimental results show that the service handles a range of client loads while maintaining bounds on objects' temporal inconsistency.


Ashish Mehra, Jennifer Rexford, Hock-Siong Ang, and Farnam Jahanian

Design and Evaluation of a Window-Consistent Replication Service

University of Michigan

CSE-TR-227-95 cover page

CSE-TR-227-95

Abstract: Real-time applications typically operate under strict timing and dependability constraints. Although traditional data replication protocols provide fault tolerance, real-time guarantees require bounded overhead for managing this redundancy. When time is scarce and the overhead for managing redundancy is too high, alternative approaches must balance the trade-off between timing predictability and fault tolerance. This paper presents the design and evaluation of a window-consistent primary-backup replication service that provides timely availability of the repository by relaxing the consistency of the replicated data. A client application registers an object with the service by declaring the consistency requirements for the data, in terms of a time window. The primary ensures that each backup site maintains a version of the object that was valid on the primary within the preceding time window. The primary delivers timely service to clients by decoupling the transmission of updates to the backup from the processing of client requests. Real-time scheduling of update messages can guarantee controlled inconsistency between the primary and backup repositories. This ensures that client applications interact with a window-consistent repository when the backup must supplant a failed primary. Our prototype implementation demonstrates the utility of the window-consistent replication model. Experimental results show that the service handles a range of client loads while maintaining bounds on objects' temporal inconsistency.


Bruce Jacob

Optimization of mass storage hierarchies

University of Michigan

CSE-TR-228-95 cover page

CSE-TR-228-95

Abstract: Optimization is often a question of where one should put one's money in improving performance. As far as large storage hierarchies go, intuition suggests (and common practice supports) adding as much as is affordable of the fastest technology available. any cache hierarchy studies have shown that this is often not the optimal approach, and we show that for mass storage hierarchies it always tends to be the wrong approach. For large data sets, as is the case for network file servers, a machine with no RAM and several gigabytes of disk performs 30% faster than a machine with no disk and a cost-equivalent amount of RAM. This paper presents a mathematical analysis of the optimization of an I/O hierarchy, as well as trace driven simulations of a network file server in support of the analysis.


James K. Huggins

An Offline Partial Evaluator For Evolving Algebras

University of Michigan

CSE-TR-229-95 cover page

CSE-TR-229-95

Abstract: We describe the architecture of an evolving algebra partial evaluator, a program which specializes an evolving algebra with respect to a portion of its input. We discuss the particular analysis, specialization, and optimization techniques used and show an example of its use.


Yuri Gurevich, James K. Huggins, and Raghu Mani

The Generalized Railroad Crossing Problem: An Evolving Algebra Based Solution

University of Michigan

CSE-TR-230-95 cover page

CSE-TR-230-95

Abstract: We present an evolving algebra based solution for the Generalized Railroad Crossing problem -- a specification and verification benchmark proposed by C. Heitmeyer at NRL. We specify the system as an evolving algebra and prove that the specification satisfies the desired safety and liveness properties.


Bruce Jacob, Trevor Mudge

Notes on calculating computer performance

University of Michigan

CSE-TR-231-95 cover page

CSE-TR-231-95

Abstract: This report explains what it means to characterize the performance of a computer, and which methods are appropriate and inappropriate for the task. The most widely used metric is the performance on the SPEC benchmark suite of programs; currently, the results of running the SPEC benchmark suite are compiled into a single number using the geometric mean. The primary reason for using the geometric mean is that it preserves values across normalization, but unfortunately, it does not preserve total run time, which is probably the figure of greatest interest when performances are being compared. Cycles per Instruction (CPI) is another widely used metric, but this method is invalid, even if comparing machines with identical clock speeds. Comparing CPI values to judge performance falls prey to the same problems as averaging normalized values. In general, normalized values must not be averaged and instead of the geometric mean, either the harmonic or the arithmetic mean is the appropriate method for averaging a set running times. The arithmetic mean should be used to average times, and the harmonic mean should be used to average rates (1/time). A number of published SPECmarks are recomputed using these means to demonstrate the effect of choosing a favorable algorithm.


Perry H. Wang

Hierarchical Performance Modeling with Cache Effects: A Case Study of the DEC Alpha

University of Michigan

CSE-TR-232-95 cover page

CSE-TR-232-95

Abstract: Although computers are becoming faster, today's high performance computer architectures require an efficient compiler in order to exploit the special features and avoid the special problems of that particular architecture. On the hardware side, a faster microprocessor is usually supported by faster cache, multiple instruction issuing, and efficient pipelining. To justify these features, the compiler must be able to schedule instructions well. Nowadays, most compilers support basic compiling techniques such as strength reduction, inlining, etc. Other techniques such as loop unrolling and cyclic scheduling to enhance and fine-grain scheduling to exploit instruction parallelism may also be used. However, because microprocessors' speed is increasing faster than the speed of the memory system, there is a widening gap in speed between these two system components. This paper will discuss and demonstrate several aspects of why instruction scheduling for some kernel loops should be treated as a global problem, rather than in the traditional local greedy way of issuing each instruction as early as the microprocessor can handle it. In particular, instruction scheduling should consider the detailed implementation characteristics of the supporting cache. To support this idea, we have developed a performance bound model that includes cache effects for the DEC Alpha. By applying techniques such as loop unrolling and cyclic scheduling, new loop schedules substantially improve performance most of the time. In this research, seven out of the twelve LFK kernels running on the Alpha system have been improved through hand-scheduling that was guided by the performance bound model. The improved codes reach as high as 96% of the performance bound. The average clocks per floating-point operation (CPF) is 7.41 for the compiled LFK and 4.84 CPF through hand-scheduling, resulting in a speedup of 1.53 on average.


Peter Chen, Christopher Aycock, Gurushankar Rajamani

Reliable File Caches

University of Michigan

CSE-TR-233-95 cover page

CSE-TR-233-95

Abstract: emory is currently a second-class citizen of the storage hierarchy because of its vulnerability to power failures and software crashes. Designers have traditionally sacrificed either reliability or performance when using memory as a cache for disks; our goal is to do away with this tradeoff by making memory as reliable as disks. This paper describes and compares different hardware and software approaches to protect memory from software errors. If successful, making memory as reliable as disks will 1) improve file cache performance to that of a pure write-back scheme by eliminating all reliability-caused writes to disk; 2) improve reliability to that of a write-through scheme by making memory a reliable place to store files long term; and 3) simplify applications such as file systems and databases by eliminating write-back daemons and complex commit and checkpointing protocols.


Jaeho Lee, Edmund Durfee

A Microeconomic Approach to Intelligent Resource Sharing in Multiagent Systems

University of Michigan

CSE-TR-234-95 cover page

CSE-TR-234-95

Abstract: We have analyzed characteristics of sharable resources and developed techniques for intelligently sharing resources---specifically, communication channels---among agents in multiagent systems. Our techniques allow agents to nearly optimize their communication behavior in a self-organizing and distributed fashion, involving the use of a microeconomic pricing system based on economic laws of supply and demand and trading among agents in real-time. Our analyses are based on three measures of performance: fairness of resource allocation, waiting time for resources, and utilization of resources. Our initial analysis indicates that fairness and utilization are conflicting, in that the best utilization with a fair allocation is equivalent to the worst utilization with an unfair resource allocation, assuming the allocation policy is statically defined. To strike a balance in performance, we have developed mechanisms that establish an artificial economy, where agents can dynamically reallocate goods (resource access) using a competitive market pricing mechanism. However, unlike more common market-oriented methods, our approach does not demand convergence to equilibrium, but permits more rapid, heuristic trading, leading to near optimal performance where both buyers and sellers of resources can benefit. Our studies show that agents employing our mechanisms can dramatically improve utilization while still providing "fair" access to the resources.


Robert Burridge, Alfred Rizzi, Daniel Koditschek

Toward a Dynamical Pick and Place

University of Michigan

CSE-TR-235-95 cover page

CSE-TR-235-95

Abstract: This paper presents an initial overview of our attempts to expand our understanding of autonomous dynamic manipulation. Specifically, it proposes a sample "Dynamical Pick and Place" task and a hierarchical feedback based control strategy for its execution. Included are a sequence of experimental verifications of the underlying behaviors required.


Eric Boyd, Gheith Abandah, Hsien-Hsin Lee, Edward Davidson

Modeling Computation and Communication Performance of Parallel Scientific Applications: A Case Study of the IBM SP2

University of Michigan

CSE-TR-236-95 cover page

CSE-TR-236-95

Abstract: A methodology for performance analysis of Massively Parallel Processors (MPPs) is presented. The IBM SP2 and some key routines of a finite element method application (FEMC) are used as a case study. A hierarchy of lower bounds on run time is developed for the POWER2 processor, using the MACS methodology developed in earlier work for uniprocessors and vector processors. Significantly, this hierarchy is extended to incorporate the effects of the memory hierarchy of each SP2 node and communication across the High Performance Switch (HPS) linking the nodes of the SP2. The performance models developed via this methodology facilitate explaining performance, identifying performance bottlenecks, and guiding application code improvements.


Anisoara Nica and Elke Rundensteiner

A Constraint-based Object Model for Structured Document Management

University of Michigan

CSE-TR-237-95 cover page

CSE-TR-237-95

Abstract: Complex multimedia document handling, including the modeling, decomposition, and search across digital documents, is one of the primary services that must be provided by digital library systems. In this paper, we present a general approach for handling structured documents (e.g., SGML documents) by exploiting object-oriented database technology. For this purpose, we propose a constraint-based object model capable of capturing in a uniform manner all SGML constructs typically used to encode the structural organization of complex documents. We present a general strategy for mapping arbitrary document types (e.g., article, journal, and book DTDs) expressed using standard SGML into our model. Most importantly, we demonstrate that our model is designed to handle the integration of diverse document types into one integrated schema, thus avoiding the generating of numerous redundant class definitions for similar document subtypes. The resulting document management system DMS is thus capable of supporting the dynamic addition of new document types, and of uniformly processing queries spanning across multiple document types. In this paper, we also describe the implementation of our approach on the commercial DBMS system Illustra to demonstrate that the ease with which our approach can be realized on current OODB technology -- without requiring any special-purpose constructs. Our DMS system provides support for integrated querying of both structural as well as content-based predicates across arbitrarily complex document types.


Atri Indiresan, Ashish Mehra, Kang Shin

Design Tradeoffs in Implementing Real-Time Channels on Bus-Based Multiprocessor Hosts

University of Michigan

CSE-TR-238-95 cover page

CSE-TR-238-95

Abstract: There are a growing number of real-time applications (e.g., real-time controls, and audio/video conferencing) that require certain quality-of-service (QoS) from the underlying communication subsystem. The communication subsystem must support real-time communication services that can be used to provide the required QoS of these applications, while providing reasonably good performance for best-effort traffic. In this paper we explore the design tradeoffs involved in supporting real-time communication on bus-based multiprocessor hosts using standard network hardware. These tradeoffs are examined by implementing the concept of real-time channel, a paradigm for real-time communication services in packet-switched networks. We first present a hardware and software architecture facilitating real-time communication on bus-based multiprocessor hosts. The main features of this architecture include a dedicated protocol processor, a split-architecture for accessing real-time communication services, and decoupling of data transfer and control in the communication protocol stack. The implications of network adapter characteristics for real-time communication are considered and desirable adapter features derived. Techniques to circumvent limitations in adapters not providing explicit support for real-time communication are presented. Support for real-time communication necessitates that shared host resources such as bus bandwidth, protocol processing bandwidth, and link bandwidth are consumed in a global order determined by the traffic characteristics of the active real-time channels. We present data transfer optimizations, mechanisms to schedule protocol processing, and link scheduling mechanisms that together achieve this goal. The effectiveness of our real-time channel implementation is demonstrated through experiments while varying traffic characteristics.


Wu-Chang Feng, Jennifer Rexford, Stuart Daniel, Ashish Mehra, and Kang Shin

Tailoring Routing and Switching Schemes to Application Workloads in Multicomputer Networks

University of Michigan

CSE-TR-239-95 cover page

CSE-TR-239-95

Abstract: Achieving good overall performance in multicomputers requires matching application communication characteristics with a suitable network design. In order to study the complex interactions between router policies and communication workloads, we are building SPIDER (Scalable Point-to-point Interface DrivER), an experimental router that implements various routing-switching combinations through microprogrammable routing engines. By simulating a network of SPIDERs at the cycle level, we evaluate the performance of several routing and switching schemes under a variety of traffic loads. These results show that tuning network policies to application communication characteristics can significantly improve multicomputer performance.


Yuri Gurevich and James K. Huggins

Equivalence Is In The Eye Of The Beholder

University of Michigan

CSE-TR-240-95 cover page

CSE-TR-240-95

Abstract: This is a reaction to Lamport's ``Processes are in the Eye of the Beholder.'' To illustrate the ``insubstantiality of processes,'' Lamport represents a 2-process algorithm and an N-process algorithm by temporal formulas and proves the equivalence of the two formulas. We analyze in what sense the two algorithms are and are not equivalent, and give a more direct equivalence proof.


Harumi Kuno, Young-Gook Ra, Elke Rundensteiner

The Object-Slicing Technique: A Flexible Object Representation and Its Evaluation

University of Michigan

CSE-TR-241-95 cover page

CSE-TR-241-95

Abstract: Recently much work has been done towards extending object-oriented database systems (OODBs) with advanced tools such as view technology, advanced schema evolution support, and role modeling systems. These extensions all require that the underlying database system supports more powerful and flexible modeling constructs than are currently supported by existing OODB systems. In this paper, we identify these features as multiple classification, dynamic reclassification, and dynamic restructuring. We then describe a methodology known as object-slicing that is capable of extending data models to support these required features. We have successfully implemented an object-slicing software layer using the GemStone system, which while still providing full access to all GemStone DBMS functions, now also offers all required modeling features. In this paper, we describe our experimental results evaluating the relative costs and benefits of adopting the object-slicing technique. This includes an analytical assessment of the storage overhead of the object-slicing representation, and its comparison against the conventional representational models. As clustering is critical to optimizing queries on such models, we present the results of using OO7 benchmark test suites evaluating various clustering strategies for the object-slicing model. We find that for certain types of queries (e.g., those that benefit from the superior blocking factor at the local attribute level resulting from clustering object-slices together by class), the object-slicing model outperforms the conventional approach in spite of its storage overhead, while queries involving inherited attributes with low selectivity are better serviced using a conventional object-clustering approach. Keywords: Multiple classification, object-oriented database models, performance evaluation, object clustering, benchmarking.


Gregory R. Ganger

System-Oriented Evaluation of I/O Subsystem Performance

University of Michigan

CSE-TR-243-95 cover page

CSE-TR-243-95

Abstract: This dissertation demonstrates that the conventional approach for evaluating the performance of an I/O subsystem design, which is based on standalone subsystem models, is too narrow in scope. In particular, conventional methodology treats all I/O requests equally, ignoring differences in how individual request response times affect system behavior. As a result, it often leads to inaccurate performance predictions and can thereby lead to incorrect conclusions and poor design choices. A new methodology, which expands the model's scope to include other important system components (e.g., CPUs and system software), is proposed and shown to enable accurate predictions of both subsystem and overall system performance. This dissertation focuses on two specific problems with conventional methodology: 1. Benchmark workloads are often not representative of reality in that they do not accurately reflect feedback effects between I/O subsystem performance (in particular, individual request completion times) and the workload of requests (in particular, subsequent request arrivals). 2. Changes in I/O subsystem performance (e.g., as measured by mean request response times) do not always translate into similar changes in overall system performance (e.g., as measured by mean elapsed times for user tasks). These problems are fundamental to the subsystem-oriented approach and are independent of the model's accuracy. The first problem is illustrated with several examples where commonly-utilized workload generators trivialize feedback effects and produce unrealistic workloads. In each case, quantitative and/or qualitative errors result. The second problem is illustrated with a disk scheduling algorithm that gives priority to those requests that are most critical to overall system performance. This change increases overall system performance while decreasing storage subsystem performance (as indicated by subsystem metrics). In all of the experiments, the new methodology is shown to avoid the short-comings of conventional methodology.


Bruce L. Worthington

Aggressive Centralized and Distributed Scheduling of Disk Requests

University of Michigan

CSE-TR-244-95 cover page

CSE-TR-244-95

Abstract: Disk request scheduling is a critical element of high-performance computer system design. Previous disk scheduling research is insufficient because it uses simplistic or outdated disk models, unrepresentative workloads, and inadequate performance metrics. This dissertation explores the disk scheduler design space and provides guidelines for computer system engineers. Host-based and disk-based centralized scheduling are compared with distributed scheduling, a new paradigm with cost, complexity, and performance advantages. In distributed scheduling, individual disk schedulers located at multiple points along the I/O path cooperate to make scheduling decisions. Scheduler implementations are evaluated using a detailed simulator containing validated models of state-of-the-art host systems, controllers, buses, and disk drives. The simulator is driven with extensive traces of real-world system and disk activity, and it reports system performance metrics (e.g., application run times) in addition to traditional disk subsystem metrics (e.g., mean request response times). It is shown that for the best system performance, scheduling algorithms must incorporate system-level information (e.g., request priorities). Scheduling algorithms that focus on reducing seek delays do not need extensive disk-specific information. Finally, algorithms must exploit a disk's on-board cache to achieve the lowest disk service times. Additional guidelines are provided for dealing with disk drive command queues, sequential disk request optimizations, and different on-board cache configurations.


Brian Murray

Hierarchical Testing Using Precomputed Tests for Modules

University of Michigan

CSE-TR-245-95 cover page

CSE-TR-245-95

Abstract: This thesis presents a new theory and method for generating tests for digital integrated circuits (ICs). Unlike most previous test generation techniques, which use logic-gate models of ICs, our method uses hierarchical models containing large primitive modules such as ALUs and multipliers connected by multi-bit buses. Many modern ICs are composed of complex, previously designed modules that can only be tested by reusing tests that were computed when the modules were designed. Our method propagates stimulus signals to the inputs of each module, and propagates module responses to the IC's primary outputs. We describe and evaluate a program we designed named PathPlan, one of the first test generators to use circuit and signal hierarchy and precomputed tests. PathPlan propagates symbolic references to precomputed tests along circuit paths. It is much faster than conventional test generators, and some of its features have already been applied commercially. However PathPlan and other hierarchical test generators described in the literature cannot generate tests for ICs whose bus structure is irregular, that is, when some buses are truncated. We present a general theory of signal propagation that enables the development of hierarchical test generation programs for complex circuits with truncated buses. A special representation of a module's input-output behavior called a propagation function is defined to quantify the module's signal transmission properties. A propagation algebra is developed and used to combine propagation functions for modules into propagation functions for multi-module circuits. Several theorems governing the transparency of signal transmission in circuits are proved using this algebra. The algebra also forms the basis for an efficient, hierarchical error propagation method for test generation, and a novel design-for-testability strategy. Finally, we describe a successor program to PathPlan called PathPlan2. This new test generator propagates module test stimulus signals as symbolic expressions, and implements the hierarchical error signal propagation method based on our theory to transmit module responses. It handles circuits with irregular bus structures and is more powerful and general than PathPlan, nevertheless, its performance is at least as good.


Harumi Kuno, Elke Rundensteiner

The MultiView OODB View System: Design and Implementation

University of Michigan

CSE-TR-246-95 cover page

CSE-TR-246-95

Abstract: Views are an established technique for restructuring and repartitioning the format of data, classes, and schemata so that applications can customize shared data objects without affecting other applications' perceptions of the data. The MultiView system is one of the first OODB systems to support dynamic and updatable materialized object-oriented database views. ultiView is fully functional and is being used for a number of projects. In this paper, we describe our system's architecture, the services it provides, and the decisions we made during our implementation. Although the GemStone system we chose for our implementation base offers many features that greatly aided our implementation, it does not support several key object-model properties that are critical for the realization of our design principles. These fundamental properties include multiple classification, dynamic object-restructuring, and the ability to make dynamic changes to the schema. In this paper, we describe a flexible and powerful technique known as object-slicing that we adopted to construct the MultiView object model---this now successfully addresses our requirements. The MultiView system is distinguished by a number of unique features, including the incorporation of virtual classes into the global schema as first-class database citizens, support for capacity-augmenting views (virtual classes that add new extrinsic properties or behavior), view materialization strategies that take advantage of object-oriented modeling features, and a graphical interface that is tailored to provide easy access to the MultiView system. The resulting system preserves all of the capabilities of the underlying GemStone OODB while providing support for dynamic and updatable materialized object-oriented views.


Sunju Park, William Birmingham

Multiagent Negotiation Framework (A preliminary report)

University of Michigan

CSE-TR-248-95 cover page

CSE-TR-248-95

Abstract: Negotiation, an interactive communication among participants to resolve conflicts, has been often adopted in multiagent systems to coordinate agents' behavior. However, most of the negotiation models in multiagent systems are domain-dependent and do not support the complex strategic interactions such as elaborate argumentation. The theme of our research is to design a unified framework to model the negotiation process. Rather than developing a specific model for each application, we build a generic framework which can support a variety of negotiation situations. Our framework is defined in terms of three constituents: the negotiating agent model, the negotiation language, and the negotiation protocol. The negotiating agent maintains mental states and reasons about them. Depending on the application and the complexity required, one can model different agents based on the abstract agent model we propose to represent the wide variety of agenthood. The negotiation language consists of the performatives and the content information, and its semantics is defined by the mental states of the speaker and the hearer and the changes of them. The negotiation protocol governs the interaction among agents by constraining the way the participants communicate. While the language gives an unambiguous definition of the performatives, the protocol defines their appropriate use. Though primitive, the new negotiation framework shows its strengths in both theoretical and practical areas. In practice, the agent model is flexible enough to support a variety of agenthood and the negotiation language and mechanism are expressive enough to cover the wide range of applications. In theory, negotiation can be unifiedly explained in the viewpoint of the three components of our framework.


Yuri Gurevich, Charles Wallace

Specification and Verification of the Undo/Redo Algorithm for Database Recovery

University of Michigan

CSE-TR-249-95 cover page

CSE-TR-249-95

Abstract: The undo/redo database recovery algorithm is specified in terms of an evolving algebra. The specification is used to verify correctness and liveness properties of the algorithm.


Peter Chen, Christopher Aycock, Wee-Teck Ng, Gurushankar Rajamani, Rajagopalan Sivaramakrishnan

Rio: Storing Files Reliably in Memory

University of Michigan

CSE-TR-250-95 cover page

CSE-TR-250-95

Abstract: emory is currently a second-class citizen of the storage hierarchy because of its vulnerability to power failures and software crashes. Designers have traditionally sacrificed either reliability or performance when using memory as a cache for disks; our goal is to do away with this tradeoff by making memory as reliable as disks. The Rio (RAM I/O) project at Michigan is modifying the Digital Unix (formerly OSF/1) kernel to protect the file cache from operating system crashes. If successful, making memory as reliable as disks will 1) improve file cache performance to that of a pure write-back scheme by eliminating all reliability-caused writes to disk; 2) improve reliability to that of a write-through scheme by making memory a reliable place to store files long term; and 3) simplify applications such as file systems and data bases by eliminating write-back daemons and complex commit and checkpointing protocols.


Matthew Jones, Elke Rundensteiner

An Object Model and Algebra for the Implicit Unfolding of Hierarchical Structures

University of Michigan

CSE-TR-251-95 cover page

CSE-TR-251-95

Abstract: Design applications typically require radically varied views of shared design data, ranging from highly compact hierarchical design artifacts to flat and unfolded structures. Because current OODB view technology is not capable of providing this requisite variety of representations, we address this problem by providing special-purpose support for manipulating and querying hierarchical structures. Our object model lays the foundation for the implementation of meta-classes required to support hier archical set operations. For example, the algebra defined over the data model allows the implicit unfolding of hierarchical sets, providing users with a (virtual) flattened and unfolded view of shared data without the penalty of having to maintain replicated data. It thus forms the basis for powerful extensions to the view capabilities of the OODB view system MultiView. Our query operators can be used to derive unmaterialized, unfolded, and updatable views from folded hierarchical sets. These views are updatable using two types of update operations, namely, in-context and out-of-context updates. For this purpose, we present an algorithm to optimally perform in-context updates on an unfolded view via selective unfolding. In order to evaluate the advantages and limitations of interoperating through such complex hierarchical and flattened views of design data, we also present empirical performance results comparing queries on implicitly and explicitly unfolded hierarchical sets. Keywords: Hierarchical Data, Hierarchical Object-Oriented Views, Data Transformations, Hierarchical Set Algebra, Folded Design Data.


Harumi Anne Kuno, Elke A. Rundensteiner

Using Object-Oriented Principles to Optimize Update Propagation to Materialized Views

University of Michigan

CSE-TR-252-95 cover page

CSE-TR-252-95

Abstract: View materialization is known to be a valuable technique for performance optimization in relational databases, and much work has been done addressing the problem of consistently maintaining relational views under update operations. However, little progress has been made thus far regarding the topic of view materialization in object-oriented databases (OODBs). In this paper, we demonstrate that there are several significant differences between the relational and object-oriented paradigms that can be exploited when addressing the object-oriented view materialization problem. For example, we can use the subsumption relationships between classes to identify branches of classes to which we do not need to propagate updates. Similarly, we can use encapsulated interfaces combined with the fact that any unique database property is inherited from a single location to provide a ``registration'' service by which virtual classes can register their interest in specific properties and be notified upon modification of those properties. We describe a number of techniques that take advantage of such data model differences to optimize the maintenance of materialized views in the object-oriented model. We have successfully implemented all proposed techniques in the MultiView system, which provides updatable materialized classes and virtual schemata on top of the GemStone OODBMS. In this paper, we also report results from the experimental studies we have run on the MultiView system measuring the impact of various optimization strategies incorporated into our materialization update algorithms.


Karem A. Sakallah

Dynamic Modeling of Logic Gate Circuits

University of Michigan

CSE-TR-253-95 cover page

CSE-TR-253-95

Abstract: We propose a dynamic model of logic gate circuits that integrates their functional and temporal aspects in a single unified framework based on a calculus of symbolic logic waveforms. Using a continuous time model, appropriate time derivatives that capture the dynamic behavior of digital signals are defined. These derivatives provide a natural link between the functional and timing components of gate models and highlight the conditional nature of gate delay. The framework subsumes all known dynamic models of logic gates and - through a formal process of functional and temporal abstraction - provides a sound foundation for functional timing analysis.


Gregory Ganger, Yale Patt

Soft Updates: A Solution to the Metadata Update Problem in File Systems

University of Michigan

CSE-TR-254-95 cover page

CSE-TR-254-95

Abstract: Structural changes, such as file creation and block allocation, have consistently been identified as a source of problems (performance, integrity, security and availability) for file systems. This report describes *soft updates*, an implementation technique that allows a file system to safely use delayed writes for metadata updates. We show that a file system using soft updates asymptotically approaches memory-based file system performance while providing stronger integrity and security guarantees than most UNIX file systems. For metadata update intensive benchmarks, this improves performance by more than a factor of two when compared to the conventional synchronous write approach. In addition, soft updates can improve file system availability by relegating crash-recovery assistance (e.g., the fsck utility) to an optional and/or background role, reducing file system recovery time to a few seconds.


Karem A. Sakallah

Functional Abstractions and Partial Specification of Boolean Functions

University of Michigan

CSE-TR-255-95 cover page

CSE-TR-255-95

Abstract: We define functional abstraction as the process of deliberately ignoring the dependence of a Boolean function on a subset of its variables. Functional abstraction causes a completely specified function to become partially specified. We propose functions sets as a theoretical model for partially specified functions and function intervals as a practical approximation to them. We develop an interval Boolean algebra suitable for the symbolic manipulation of function intervals and highlight the relationship between functional abstraction and universal and existential quantification.


Hsien-Hsin Lee, Edward Davidson

Automatic Generation of Performance Bounds on the KSR1

University of Michigan

CSE-TR-256-95 cover page

CSE-TR-256-95

Abstract: Performance evaluation techniques have become critical indices for both architecture designers and software developers in recent years. Automatic tools are in demand for characterizing performance easily. In addition, a good performance analyzer should be able to point out the amount of performance loss relative to an ideal, where it occurs, and its causes. In this paper, MACS bounds, a hierarchy of bounds for modeling the performance, will be introduced for loop-dominated scientific code. This model successively considers the peak floating-point performance of a achine of interest (M), the essential operations in a high level Application code of interest (MA), the additional operations in the Compiler-generated workload (MAC), the compiler-generated Schedule for this workload (MACS), and the actual delivered performance. In ascending through the MACS bounds hierarchy from the M bound, the model becomes increasingly constrained as it moves in several steps from potentially deliverable toward actually delivered performance. Each step from one bound to the next quantifies a performance gap associated with particular causes. The bound methodology will then be applied to a multiple-processor system to predict the run-time bound of a scientific application. We present three automatic bound generators, K-MA, K_MACSTAT and MACS_GAP, based on our performance model on the KSR1 Massively Parallel Processing (MPP) machine. Finally, we will use the performance bound models and tools to experiment on a real parallel scientific application running on the KSR1. The analysis results of our selected loops will be illustrated by the MACS bounds hierarchy, machine utilization, and relative speed-up to uniprocessor. These information will be very useful for improving the performance of the application at the right point .


David Van Campenhout and Trevor Mudge

Timing Analysis of Digital Systems in Gated Clocks

University of Michigan

CSE-TR-257-95 cover page

CSE-TR-257-95

Abstract: The SMO model for analyzing the temporal behavior of general synchronous systems has proven to be very effective. This paper extends the model to systems in which the clock signals can be gated (conditional). We describe a method for verifying the timing of such systems. Our approach hinges on the use of information about design intent in the analysis to obtain good accuracy. Ingredients for a timing verification tool that allows designers to take advantage of gated clocks are developed. The system is partitioned into datapath and control sections. The datapath is analyzed using the conditional delay model. Gated clocks allow multi-cycle behavior. Symbolic finite state machine traversal techniques are employed for checking the validity of multi-cycle paths.


Gheith Abandah, Edward Davidson

Modeling the Communication and Computation Performance of the IBM SP2

University of Michigan

CSE-TR-258-95 cover page

CSE-TR-258-95

Abstract: The objective of this report is to develop models that characterize the performance of the different aspects of a multicomputer to facilitate activities such as explaining the achieved performance of message-passing applications, developing efficient message-passing applications, and comparing the performance of different multicomputers and application code implementations. The models developed in this report estimate the execution time of a message-passing application given its high-level source code. In this report, we present our models for characterizing the communication and computation performance of multicomputers. Our approach in developing communication performance models is to model the performance of a set of common communication patterns. This set contains the basic communication patterns, and is selected in such a way that the complex communication patterns can be constructed from these basic patterns. Hence the time of a complex communication pattern is estimated by summing the times of its basic components. Our approach for modeling the computation performance is through using a Machine-Application Performance Bound (MA Bound). The MA Bound of an application is an upper bound for the performance that can be achieved for the application on a certain machine. In this report we present our models for the IBM SP1, IBM SP2, and Convex SPP-1000. And we present some uses of these models, e.g. studying the performance of six broadcast algorithms on the IBM SP2, and comparing the communication performance of the IBM SP2 and the Convex SPP-1000.


Joseph D'Ambrosio, William Birmingham

Hierarchical Concurrent Engineering: Supporting Hierarchical Decomposition and Peer-to-Peer Problem Solving

University of Michigan

CSE-TR-259-95 cover page

CSE-TR-259-95


Wu-chi Feng, Farnam Jahanian, Stuart Sechrest

An Optimal Bandwidth Allocation Strategy for the Delivery of Compressed Prerecorded Video

University of Michigan

CSE-TR-260-95 cover page

CSE-TR-260-95

Abstract: The transportation of compressed video data without loss of picture quality requires the network to support large fluctuations in bandwidth requirements. Fully utilizing a client-side buffer for smoothing bandwidth requirements can limit the fluctuations in bandwidth required from the underlying network. This paper shows that for a fixed size buffer constraint, the critical bandwidth allocation technique results in the minimum number of bandwidth increases necessary for stored video playback. In addition, this paper introduces an optimal bandwidth allocation algorithm which minimizes the number of bandwidth changes necessary for the playback of stored video and achieves the maximum effectiveness from client-side buffers. For bandwidth allocation plans that allocate bandwidth for the length of a video, the optimal bandwidth allocation algorithm also minimizes the number of bandwidth increases as well as the total increase changes in bandwidth. A comparison between the optimal bandwidth allocation algorithm and other critical bandwidth based algorithms using several full-length movie videos is also presented.


Waleed Meleis, Edward Davidson

Optimal Dual-Issue Instruction Scheduling With Spills For Binary Expression Trees

University of Michigan

CSE-TR-261-95 cover page

CSE-TR-261-95

Abstract: We describe an algorithm that produces code, with spills, for a register-constrained machine that can issue up to one arithmetic operation and one memory access operation per time slot, under the restrictions that the code's dependence graph is represented as a binary tree with no unary operations, and the latency of the operations is 1. We prove that the algorithm produces a minimum cost schedule, and show that its complexity is O(nk) where n is the number of operations to be scheduled and k is the number of spills in the schedule. The number of issue slots in the minimum length schedule is rho + 2k + |A| where rho is the number of registers used, k is the number of spills and |A| is the number of arithmetic operations in the tree. We show this is the cost of any schedule that is in "contiguous form".


V. Chandramouli, Karem A. Sakallah

Modeling the Effects of Temporal Proximity of Input Transitions on Gate Propagation Delay and Transition Time

University of Michigan

CSE-TR-262-95 cover page

CSE-TR-262-95

Abstract: While delay modeling of gates with a single switching input has received a lot of attention, the case of multiple inputs switching in close temporal proximity is just beginning to be addressed in the literature. The effect of proximity of input transitions can be significant on the delay and output transition time. The few attempts that have addressed this issue are based on a series-parallel transistor collapsing method that reduces the multi-input gate to an inverter. This limits the technique to CMOS technology. Moreover, none of them discuss the appropriate choice of voltage thresholds to measure delay for a multi-input gate. In this paper, we first present a method for the choice of voltage thresholds for a multi-input gate that ensures a positive value of delay for any combination of input transition times and the temporal separations among them. We next introduce a dual-input proximity model for the case when only two inputs of the gate are switching. We then prose a simple algorithm for calculating the delay and output transition time that makes repeated use of the dual-input proximity model and that does not collapse the gate into a equivalent inverter. Comparison with simulation results shows that our method performs quite well in practice. Before concluding the paper we also show the close relationship between the inertial delay of a gate and the proximity of input transitions.


Hsien-Hsin Lee, Edward Davidson

Automatic Parallel Program Conversion from Shared-Memory to Message-Passing

University of Michigan

CSE-TR-263-95 cover page

CSE-TR-263-95

Abstract: It is an elaborate work to parallelize an application on message-passing machines since there is few good software or compilers supported for dealing with the interprocessor communication. Accordingly, it is inefficient for programmers to understand the communication behavior and explicitly specify them in the code by hand. In this paper, three schemes are presented for alleviating this time-consuming and error-prone task. Scheme A and B are two obvious methods to implement the communication code while the Scheme C is a systematic methodology for generating a functional and efficient message-passing code in an automatic manner.


Clare Congdon

A Comparison of Genetic Algorithms and Other Machine Learning Systems of a Complex Classification Task from Common Disease Research

University of Michigan

CSE-TR-264-95 cover page

CSE-TR-264-95

Abstract: The thesis project is an investigation of some well-known machine learning systems and evaluates their utility when applied to a classification task from the field of human genetics. This common-disease research task, an inquiry into genetic and biochemical factors and their association with a family history of coronary artery disease (CAD), is more complex than many pursued in machine learning research, due to interactions and the inherent noise in the dataset. The task also differs from most pursued in machine learning research because there is a desire to explain the dataset with a small number of rules, even at the expense of accuracy, so that they will be more accessible to medical researchers who are unaccustomed to dealing with disjunctive explanations of data. Furthermore, there is assymetry in the task in that good explanations of the positive examples is of more importance than good explanations of the negative examples. The primary machine learning approach investigated in this research is genetic algorithms (GA's); decision trees, Autoclass, and Cobweb are also included. The GA performed the best in terms of descriptive ability with the common-disease research task, although decision trees also demonstrated certain strengths. Autoclass and Cobweb were recognized from the onset as being inappropriate for the needs of common-disease researchers (because both systems are unsupervised learners that create probabilistic structures), but were included for their interest in the machine learning community; these systems did not perform as well as GA's and decision trees in terms of their ability to describe the data. In terms of predictive accuracy, all systems performed poorly, and the differences between any two of the three best systems is not significant. When positive and negative examples are considered separately, the GA does significantly better than the other systems in predicting positive examples and significantly worse in predicting negative examples. The thesis illustrates that the investigation of ``real'' problems from researchers in other fields can lead machine learning researchers to challenge their systems in ways they may not otherwise have considered, and may lead these researchers to a symbiotic relationship that benefits multiple research communities.


Paul Jensen, Nandit Soparkar

Real-Time Concurrency Control in Groupware

University of Michigan

CSE-TR-265-95 cover page

CSE-TR-265-95

Abstract: Concurrency control has been identified as an important issue for groupware systems in which several users may access shared data resources. The unique real-time responsiveness and consistency requirements in groupware environments suggest that traditional approaches (e.g., from transaction-processing) need to be modified in order to be deployed. We describe how recently developed techniques from real-time transaction systems may be applied to groupware. In doing so, we provide novel ways to partition data and concurrency control to permit weaker consistency constraints in order to facilitate meeting of the responsiveness requirements. Our research addresses these issues as pertinent to both the local as well as the distributed environments in the context of groupware.


A. Eichenberger and E. Davidson

A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints

University of Michigan

CSE-TR-266-95 cover page

CSE-TR-266-95

Abstract: High performance compilers increasingly rely on accurate modeling of the machine resources to efficiently exploit the instruction level parallelism of an application. In this paper, we propose a reduced machine description that results in faster detection of resource contentions while preserving the scheduling constraints present in the original machine description. The proposed approach reduces a machine description in an automated, error-free, and efficient fashion. oreover, it fully supports schedulers that backtrack and process operations in arbitrary order. Reduced descriptions for the DEC Alpha 21064, MIPS R3000/R3010, and Cydra 5 result in 4 to 7 times faster detection of resource contentions and require 22 to 90% of the memory storage used by the original machine descriptions.


Hakan Yalcin, John Hayes

Event Propagation Conditions in Timing Analysis

University of Michigan

CSE-TR-267-95 cover page

CSE-TR-267-95

Abstract: We present a systematic study of signal propagation conditions (PCs) starting from a general waveform model. We develop a number of specific waveform models based on how closely they match actual signal behavior, and show that they form a well-defined hierarchy. For each model, we derive PCs based on fundamental cause-and-effect behavior. We then construct a lattice of all known PCs that reveals the relationships among them. This lattice also enables us to derive some new and potentially useful PCs. Experimental results are presented to evaluate the accuracy of the proposed PCs.


Wataro Shinohara, Daniel Koditschek

A Simplified Model of a Supercritical Power Plant

University of Michigan

CSE-TR-268-95 cover page

CSE-TR-268-95

Abstract: We develop a simplified state-space model of a once-through supercritical boiler turbine power plant. This phenomenological model has been developed to facilitate analytical insight in support of our investigation of wide load cycling operations including startup and shut down. We compare our model to several more physically accurate (and far more complicated) models and provide as well some contrasting simulation results.


Monica Brockmeyer, Farnam Jahanian, Constance Heitmeyer, and Bruce Labaw

Monitoring and Assertion-Checking Real-Time Specifications in Modechart

University of Michigan

CSE-TR-269-95 cover page

CSE-TR-269-95

Abstract: This paper describes the integration of a monitoring and assertion checking tool into the Modechart Toolset, a suite of integrated tools providing system specification, formal verification, static analysis, and specification simulation for real-time systems. The monitoring and assertion checking tool, MAC, supports monitoring of symbolic execution traces generated by the Modechart simulator, permitting testing of specifications early in the design phase and providing a mechanism for evaluating properties of the system on a particular execution trace. These capabilities are provided by the automatic translation of assertions in a declarative language (such as Real Time Logic) into monitoring fragments, written in Modechart, which augment the original specification to perform monitoring and assertion checking during simulation. In addition, we discuss several major issues of monitoring and assertion checking in this context. In particular, a key issue is that not all assertions can be evaluated with a bounded event history. A primary goal of this research is the identification of classes of assertions for which the number of events which must be recored for monitoring is independent of the length of the computation to be monitored. We address this problem by exploiting the semantics of Modechart to identify classes of assertions for which monitoring is simplified to evaluation of a state predicate on a finite event history having a bound which is established a priori.


Amit Mathur, Robert Hall, Farnam Jahanian, Atul Prakash, Craig Rasmussen

The Publish/Subscribe Paradigm for Scalable Group Collaboration Systems

University of Michigan

CSE-TR-270-95 cover page

CSE-TR-270-95

Abstract: We consider the problem of disseminating data generated in real-time to large groups of distributed users in group collaboration systems. We present an architecture based on the publish/subscribe paradigm to design communication services appropriate for large-scale group collaboration systems. In this model, one or more data sources or publishers sends data to multiple clients or subscribers. The distinguishing characteristic of this paradigm is the anonymous nature of communication. While a publisher knows about the set of subscribers, the subscribers are unaware of each other and only know about the publisher that they are receiving data from. We exploit this fact in the protocols that we use to meet the reliability and scalability requirements. We provide a formal characterization of the semantics associated with the delivery of data within this publish/subscribe framework. We then show that it is possible to use a weaker and less costly (in terms of scalability and latency) notion of synchrony in comparison to traditional work on group communication protocols. The protocols presented are based on this weaker synchrony model.


Wu-chi Feng, Farnam Jahanian, Stuart Sechrest

Providing VCR Functionality in a Constant Quality Video-on-Demand Transportation Service

University of Michigan

CSE-TR-271-95 cover page

CSE-TR-271-95

Abstract: The use of bandwidth smoothing techniques for the delivery of prerecorded compressed video has been shown to be an effective technique in reducing the network bandwidth requirements needed to play back video. Any fixed allocation scheme can affect the ability to allow for VCR functions. In this paper, we exam ine the impacts that bandwidth smoothing techniques have on the delivery of stored video. We introduce the notion of VCR-window, which allows users to have full VCR functionality in a limited window while not requiring increases in the bandwidth reservations. In addition, we introduce a resource reservation scheme that can be used with the VCR-window for the delivery of stored video. To test the applicability of the VCR-window in video-on-demand systems, we have digitized 15 full-length movies for experimental data. Our results indicate that we can provide time-limited VCR functionality while retaining a fairly high network utilization.


Stacie Hibino, Elke Rundensteiner

Interactive Visualizations for Temporal Analysis: Application to CSCW Multimedia Data

University of Michigan

Abstract: Although temporal data is commonly collected by various researchers for a variety of purposes, previous support for analyzing such data for temporal trends has been typically limited to timeline- or statistically-based approaches. In contrast, we present a new paradigm for temporal analysis -- one in which users can browse the data in search of temporal trends and relationships using an interactive visualization environment. Our specialized temporal query filters coupled with a dynamic visualization of temporal relationships not only allow users to identify strict relationships (e.g., when A events start at the same time as B events), but also allow them to relax the constraints to identify similar relationships (e.g., when A's start within a few seconds of B's). The temporal visualization (TViz) highlights the strengths of temporal relationships by clustering them together rather than distributing the display of them over time in a timeline format. In this paper, we present a case study using this new paradigm for temporal analysis of video data collected as part of a CSCW study. In the case study, we show how our approach is not limited to analyzing temporal sequences, but can also be used to examine any type of temporal relationship -- parallel, overlapping, or sequential. Finally, we illustrate how this approach simplifies the process of discovering temporal trends and discuss how it complements traditional timelines and statistical analyses.


Wee-Teck Ng, Gurushankar Rajamani, Christopher Aycock, Peter Chen

Measuring and Improving Memory's Resistance to Operating System Crashes

University of Michigan

CSE-TR-273-95 cover page

CSE-TR-273-95

Abstract: emory is commonly viewed as an unreliable place to store permanent data because it is perceived to be vulnerable to system crashes. Yet despite all the negative implications of memory's unreliability, no data exists that quantifies how vulnerable memory actually is to system crashes. The goals of this paper are to quantify the vulnerability of memory to operating system crashes and to propose a method for protecting memory from these crashes. We use software fault injection to induce a wide variety of operating system crashes in DEC Alpha workstations running Digital Unix, ranging from bit errors in the kernel stack to deleting branch instructions to C-level allocation management errors. We show that memory is remarkably resistant to operating system crashes. Out of the 996 crashes we observed, only 17 corrupted file cache data. Excluding direct corruption from copy overruns, only 2 out of 820 corrupted file cache data. This data contradicts the common assumption that operating system crashes often corrupt files in memory. For users who need even greater protection against operating system crashes, we propose a simple, low-overhead software scheme that controls access to file cache buffers using virtual memory protection and code patching.


Amit Mathur and Atul Prakash

A Protocol Composition-Based Approach to QoS Control in Collaboration Systems

University of Michigan

CSE-TR-274-95 cover page

CSE-TR-274-95

Abstract: This paper considers the problem of meeting the QoS requirements of media-streams in collaboration systems for use over wide-area networks. The QoS parameters considered, latency, jitter, packet-loss, and asynchrony, are specified and controlled on an end-to-end basis. A QoS control protocol for controlling these parameters is described. The protocol is based on a novel {\em protocol composition-based} approach. The basic idea of the approach is to modularize the protocol such that each module controls a single QoS parameter. Further each module is assigned a priority. These modules can then be {\em composed} in a number of ways, resulting in a variety of priority assignments to the QoS parameters, thus allowing for more flexible QoS control. The performance of the protocol is evaluated through experiments. The experiments illustrate how the protocol composition approach is able to successfully enforce a priority amongst the QoS parameters, allowing the parameters to be traded-off in an appropriate manner.


Trent Jaeger, Aviel Rubin

Protocols for Authenticated Download to Mobile Information Appliances

University of Michigan

CSE-TR-275-95 cover page

CSE-TR-275-95

Abstract: obile hosts download files from untrusted networks to obtain application software, information services, documents, etc. However, attackers can modify the contents of these files, either in transit or on a compromised machine. Attacks are more likely on a mobile network than a fixed network because attackers can send a malicious message to a mobile host without having to break into any machine in the network. If an attack goes undetected, the mobile host may download a file that contains: (1) erroneous or misleading data; (2) faulty applications; and (3) Trojan horses or viruses. Therefore, mobile hosts need the ability to authenticate the files they download to verify that the file downloaded is the file requested. It is difficult to authenticate files because, in general, a mobile host cannot trust any of the servers that provide location information. In this paper, we define download protocols that locate and retrieve files from an untrusted network and authenticate the downloaded file using only a single trusted principal. Energy consumption is also a concern for mobile hosts, so the protocols are designed to minimize the amount of information a mobile host will need to download.


Send comments and suggestions to:

trcoordinator@eecs.umich.edu