CSE Technical Reports Sorted by Technical Report Number

TR Number Title Authors Date Pages

CSE-TR-324-97 Evaluating View Materialization Strategies for Complex Hierarchical Objects Jones and Rundensteiner Jan, 97 28

CSE-TR-325-97 The Linear Time Hierarchy Theorems for Abstract State Machines and RAMs Blass and Gurevich Jan, 97 31

CSE-TR-326-97 Gurevich Abstract State Machines and Schonhage Storage Modification Machines Dexter Doyle and Gurevich Jan, 97 19

CSE-TR-327-97 Formalizing Database Recovery Gurevich Soparkar and Wallace Jan, 97 16

CSE-TR-328-97 Specifying and Constructing Schedulers for Workflows with Autonomous Executions Jensen Wallace and Soparkar Feb, 97 18

CSE-TR-329-97 Characterizing Multicase Orderings using Concurrency Control Theory Jensen Soparkar and Mathur Feb, 97 14

CSE-TR-330-97 The Impact of Instruction Compression on I-cache Performance Chen Bird and Mudge Feb, 97 8

CSE-TR-331-97 Communication Characterization of a Cray T3D Vlaovic Feb, 97 27

CSE-TR-332-97 Internet Routing Instability Labovitz Malan and Jahanian Feb, 97 20

CSE-TR-333-97 Measurement-based Admission Control Algorithms for Controlled-Load Service: A Structural Examination Jamin and Shenker Apr, 97 23

CSE-TR-334-97 Symbolic Performance & Learning In Continuous-valued Environments Rogers May, 97 133
Real-world and simulated real-world domains, such as flying anddriving, commonly have the characteristics of continuous-valued (CV) environments. These environments are frequently complex and difficult to control, requiring a great deal of specific, detailed knowledge. Symbolic computation can use many different types of knowledge to constrain the control function. This thesis investigates learning the control function using observed data in a symbolic architecture. SPLICE (Symbolic Performance & Learning In Continuous-Valued Environments) is a Soar agent that performs symbolic control function approximation. SPLICE uses a three-level framework to first classify its sensory information into symbolic regions, then map the set of regions to a local model, then use the local model to determine an action. Learning causes the models to gradually become more specific and more accurate. SPLICE agent is evaluated against other adaptive control algorithms in simple domains using four experimental methodologies. We also instantiate SPLICE in a complex environment, simulated flight, and evaluate its performance. The final goal is not only to create an effective controller for complex continuous environments, but also to understand more clearly the ramifications of symbolic learning in continuous domains.

CSE-TR-335-97 Critical Issues Regarding the Trace Cache Fetch Mechanism Patel Friendly and Patt May, 97 33
In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by storing logically contiguous instructions in physically contiguous storage. When a particular basic block is requested, the trace cache can potentially respond with the requested block along with several blocks that followed it when the block was last encountered. In this technical report, we examine some critical features of a trace cache mechanism designed for a 16-wide issue processor and evaluate their effects on performance. We examine features such as cache associativity, storage partitioning, branch predictor design, instruction cache design, and fill unit design. We compare the performance of our trace cache mechanism with that of the design presented by Rotenberg et al and show a 23% improvement in performance. In our final analysis, we compare our trace cache mechanism with an aggressive single basic block fetch mechanism and show that the trace cache mechanism attains a 24% improvement in performance.

CSE-TR-336-97 May 1997 Draft of the ASM Guide Gurevich May, 97 27
This is only a draft of a portion of the ASM guide, but there are no plans to change this part in any substantial way, only to augment it.

CSE-TR-337-97 Classification-Directed Branch Predictor Design Chang May, 97 109
Pipeline stalls due to branches represent one of the most significant impediments to realizing the performance potential of deeply pipelined superscalar processors. Two well-known mechanisms have been proposed to reduce the branch penalty, speculative execution in conjunction with branch prediction and predicated execution. This dissertation proposes branch classification, coupled with improvements in conditional branch prediction, indirect branch prediction, and predicted execution, to reduce the branch execution penalty. Branch classification allows an individual branch instruction to be associated with the branch predictor best suited to predict its direction. Using this approach, a hybrid branch predictor is constructed which achieves a higher prediction accuracy than any branch predictor previously reported in the literature. This dissertation also proposes a new prediction mechanism for predicting indirect jump targets. For the perl and gcc benchmarks, this mechanism reduces the indirect jump misprediction rate by 93.4% and 63.3% and the overall execution time on an 8-wide issue out-of-order execution machine by 14% and 5%. Finally, this dissertation proposes a new method for combining the performance benefits of predicated execution and speculative execution. This approach significantly reduces the branch execution penalty suffered by wide-issue processors.

CSE-TR-338-97 Choiceless Polynomial Time Blass Gurevich and Shelah May, 97 47
Turing machines define polynomial time (PTime) on strings but cannot deal with structures like graphs directly, and there is no known, easily computable string encoding of isomorphism classes of structures. Is there a computation model whose machines do not distinguish between isomorphic structures and compute exactly PTime properties? This question can be recast as follows: Does there exists a logic that captures polynomial time (without presuming the presence of a linear order)? Earlier, one of us conjectured the negative answer.The problem motivated a quest for stronger and stronger PTime logics. All these logics avoid arbitrary choice. Here we attempt to capture the choiceless fragment of PTime. Our computation model is a version of abstract state machines (formerly called evolving algebras). The idea is to replace arbitrary choice with parallel execution. The resulting logic is more expressive than other PTime logics in the literature. A more difficult theorem shows that the logic does not capture all PTime.

CSE-TR-339-97 Using Chromaticity Distributions and Eigenspace Analysis for Pose- Illumination- and Specularity-Invariant Color Indexing of 3D Objects Lin and Lee May, 97 27
The distribution of object colors can be effectively utilized for recognition and indexing. Difficulties arise in the recognition of object color distributions when there are variations in illumination color, changes in object pose with respect to illumination direction, and specular reflections. However, most of the recent approaches to color-based recognition focus mainly on illumination color invariance. We propose an approach that identifies object color distributions influenced by: (1) illumination pose, (2) illumination color and (3) specularity. We suggest the use of chromaticity distributions to achieve illumination pose invariance. To characterize changes in chromaticity distribution due to illumination color, an eigenspace analysis is employed. A set of chromaticity histograms of each object is generated for a range of lighting colors based on linear models of illumination and reflectance, and the histograms are represented using a small number of eigen basis vectors constructed from principal components analysis. Represented in an eigenspace, chromaticity signatures for each object in the model database form a manifold encompassing a range of lighting colors. Recognition and illumination color estimation are performed by projecting the chromaticity signature of a test object into the eigenspace and examining its proximity to database manifolds. Since specular reflections may displace a projection away from its corresponding manifold, a model-based specularity detection/rejection algorithm, called chromaticity differencing, is developed to reduce this discrepancy.

CSE-TR-340-97 Schema Version Removal: Optimizing Transparent Schema Evolution Systems Crestana Lee and Rundensteiner May, 97 42
Powerful interoperability-enabling solutions for software application integration must allow applications to evolve and data requirements to change, while minimizing such changes on other integrated applications. The transparent schema evolution system (TSE), that we developed at the University of Michigan, accomplishes evolution through schema versioning, where a schema version is a dynamic object-oriented view. When the TSE system receives a view schema change request, the system computes a new view schema that reflects the desired change instead of modifying the old view schema in place, therefore preserving existing view schemas for old applications. This generation of a potentially large number of schema versions over time results in an excessive build-up of classes and underlying object instances, not all being necessarily still in use. Since our view system provides us with materialized views, a degradation of system performance due to update progagation is expected in a situation where we have no-longer utilized view schemas in the system. Also, a larger number of view schemas will result in storage overhead costs. In this paper, we address this problem using consistent schema removal techniques. First, we characterize four potential problems of schema consistency that could be caused by removal of a single virtual class; and then outline our solution approach for each of these problems. Second, we demonstrate that view schema removal is sensitive to the order in which individual classes are processed. Our solution to this problem is based on a formal model, called the dependency model, of capturing all dependencies between classes as logic clauses and of manipulating them to make decisions on class deletions and nondeletions while guaranteeing the consistency of the schema. Third, based on this formal model, we have developed and proven consistent a dependency graph (DG) representation and an associated set of rules for DG generation, reduction, and transformation. A first preliminary version of the latter has been successfully implemented in our Schema Version Removal (SVR) tool. Fourth, we present a cost model for evaluating alternative removal patterns on DG. Lastly, we report our preliminary experimental studies that demonstrate the impact of our schema version removal on the performance of the TSE system.

CSE-TR-341-97 A Case Study of a Hardware-Managed TLB in a Multi-Tasking Environment Lee Uhlig and Mudge Jun, 97 27
There have been very few performance studies of hardware-managed translation look-aside buffers (TLBs). The major reason is the lack of efficient and accurate analysis tools. Newer operating systems, applications, and the popularity of the client-server model of computation place a greater burden than their predecessors on memory system components such as TLBs. Thus it is becoming more important to measure the performance of memory systems under such workloads. In this work, we implemented a trap-driven simulator in an operating system to emulate a variety of TLBs. Using this tool, we were able to evaluate the performance of a range of TLBs under these newer workloads. The results show that in order to improve the TLB performance, we should carefully map pages into the TLB, append process identifiers to avoid flushing the TLB contents frequently, or reserve part of the TLB for a particular server process.

CSE-TR-342-97 Improving Code Density Using Compression Techniques Lefurgy Bird Chen and Mudge Jul, 97 17
We propose a method for compressing programs in embedded processors where instruction memory size dominates cost. A post-compilation analyzer examines a program and replaces com mon sequences of instructions with a single instruction codeword. A microprocessor executes the compressed instruction sequences by fetching codewords from the instruction memory, expanding them back to the original sequence of instructions in the decode stage, and issuing them to the execution stages. We apply our technique to the PowerPC instruction set and achieve 30% to 50% reduction in size for SPEC CINT95 programs.

CSE-TR-343-97 A Chromaticity Space for Specularity Illumination Color- and Illumination Pose-Invariant 3-D Object Recognition Berwick and Lee Jul, 97 15
Most of the recent color recognition/indexing approaches concentrate on establishing invariance to illumination color to improve the utility of color recognition. However, other effects caused by illumination pose and specularity on three-dimensional object surfaces have not received notable attention. We present a chromaticity recognition method that discounts the effects of illumination pose, illumination color and specularity. It utilizes a chromaticity space based on log-ratio of sensor responses for illumination pose and color invariance. A model-based specularity detection/rejection algorithm can be used to improve the chromaticity recognition and illumination estimation for objects including specular reflections.

CSE-TR-344-97 An Architecture for Inter-Domain Troubleshooting Thaler and Ravishankar Aug, 97 16
In this paper, we explore the constraints of a new problem: that of coordinating network troubleshooting among peer administrative domains or Internet Service Providers, and untrusted observers. Allowing untrusted observers permits any entity to report problems, whether it is a Network Operations Center (NOC), end-user, or application. Our goals here are to define the inter-domain coordination problem clearly, and to develop an architecture which allows observers to report problems and receive timely feedback, regardless of their own locations and identities. By automating this process, we also relieve human bottlenecks at help desks and NOCs whenever possible. We begin by presenting a troubleshooting methodology for coordinating problem diagnosis. We then describe GDT, a distributed protocol which realizes this methodology. We show through simulation that GDT performs well as the number of observers and problems grows, and continues to function robustly amidst heavy packet loss.

CSE-TR-345-97 Interactive Delayed-Sharing of Computer-Supported Workspaces via the Streaming of Re-Executable Content Manohar Aug, 97 179
This dissertation introduces the notion of interactive delayed-sharing of a session on a computer-supported workspace to allow reuse of such session at a later time. A data artifact, referred to as a session object, encapsulates the session. A session object is composed of heterogeneous multimedia streams that represent temporally-ordered input sequences to applets in the workspace. Playback of a session object recreates the underlying workspace spanned by these applets through the streaming and re-execution of these input sequences in their respective applets. The contributions of this dissertation are as follows. First, this dissertation pioneers the re-execution approach to the record and replay of sessions for supporting computer-supported collaborative work. In doing so, it introduces a novel paradigm for flexible support of asynchronous collaboration that allow users to collaborate by annotating, editing, and refining these delayed-shared workspaces. This dissertation explores these collaborative features particularly focusing on the record, representation, and playback of these workspaces. Second, this dissertation introduces the notion of workspaces as authored, transportable, and temporally-aware artifacts and investigates flexible mechanisms for the delivery of temporal-awareness to workspaces composed of heterogeneous applications through the decoupling of tool and media services. This dissertation develops a formal specification through temporal relationships in these workspaces. Finally, this dissertation describes mechanisms for the integration of re-executable content streams together with continuous multimedia streams. It describes a framework for media-independent integration of these heterogeneous streams and their temporal constraints. The dissertation introduces the use of statistical process controls to guide the relaxation and/or constraint of scheduling and synchronization mechanisms so as to flexibly support a range of heterogeneous media streams and their temporal relationships.

CSE-TR-346-97 Limited Dual Path Execution Tyson Lick and Farrens Aug, 97 14
This work presents a hybrid branch predictor scheme that uses a limited form of dual path execution along with dynamic branch prediction to improve execution time across a broad set of applications. The ability to execute down both paths of a conditional branch enables branch penalty to be minimized; however, relying exclusively on dual path execution is infeasible due to the a cascading effect because instruction fetch rates far exceed the capability of the pipeline to retire a single branch before others must be processed. By using confidence information, available in the dynamic branch prediction state tables, a limited form of dual path execution becomes feasible. The reduces the burden on the branch predictor by allowing predictions of low confidence to be avoided. In this study we present a new approach to gather branch prediction confidence with little or no overhead, and use this confidence mechanism to determine whether dual path execution or branch prediction should be used.Comparing this hybrid predictor model to the dynamic branch predictor shows a dramatic decrease in misprediction rate (from 30 to 60 %). This translates to an reduction in runtime of 10 - 15%. These results imply that dual path execution, which often is thought to be a resource consuming method, may be a worthy approach if restricted with a predicting set.

CSE-TR-347-97 Adaptive Packet Marking for Providing Differentiated Services in the Internet Feng Kandlur Saha and Shin Oct, 97 21
This paper examines the use of adaptable priority marking for providing soft bandwidth guarantees to individual connections or connection groups over the Internet. The proposed scheme does not require resource reservation for individual connections and can be supported with minimal changes to the network infrastructure. The scheme uses modest support from the network in the form of priority handling for appropriately marked packets and relies on intelligent transmission control mechanisms at the edges of the network to achieve the desired throughput levels. The paper describes the control mechanisms and evaluates their behavior in various network environments. The mechanisms described have the flexibility to adapt to different network environments, thereby making them suitable for deployment in an evolving Internet.

CSE-TR-348-97 Flexible Timing Simulation of Multiple-Cache Configurations Tam Rivers and Davidson Oct, 97 16
As the gap between processor and memory speeds increases, cache performa nce becomes more critical to overall system performance. Behavioral cache simul ation is typically used early in the design cycle of new processor/cache configu rations to determine the performance of proposed cache configurations on target workloads. However, behavioral cache simulation does not account for the latenc y seen by each memory access. The Latency-Effects (LE) cache model presented in this paper accounts this nominal latency as well as the additional latencies du e to trailing-edge effects, bus width considerations, port conflicts, and the nu mber of outstanding accesses that a cache allows before it blocks. We also exte nd the LE cache model to handle the latencies seen for moving data among multipl e caches. mlcache, a new, easily configurable and extensible tool, has b een built based on the extended LE model. We show the use of mlcache in estimating the performance of traditional and novel cache configurations, includ ing odd/even, 2-level, Assist, Victim, and NTS caches. We also show how the LE cache timing model provides more useful, realistic performance estimates than ot her possible behavioral-level cache timing models.

CSE-TR-349-97 Techniques for Eliminating Packet Loss in Congested TCP/IP Networks Feng Kandlur Saha and Shin Nov, 97 20
The congestion control mechanisms used in TCP have been the focus of numerous studies and have undergone a number of enhancements. However, even with these enhancements, TCP connections still experience alarmingly high loss rates, especially during times of congestion. To alleviate this problem, the IETF is considering active queue management mechanisms, such as RED, for deployment in the network. In this paper, we first show that even with RED, loss rates can only be reduced marginally in times of congestion. We then show that the solution to this problem lies in decoupling congestion notification from packet loss through the use of explicit congestion notification (ECN) and in understanding the traffic generated by an aggregation of a large number of sources. Given this, we then propose and experiment with more adaptive active queue management algorithms (Adaptive RED) and more conservative end-host mechanisms (SubTCP) which can significantly reduce loss rates across congested links. When used together, these mechanisms can effectively eliminate packet loss while maintaining high link utilizations under the most difficult scenarios.

CSE-TR-350-97 Dynamics of Quality-of-Service Routing with Inaccurate Link-State Information Shaikh Rexford and Shin Nov, 97 20
Quality-of-service (QoS) routing can satisfy application performance requirements and optimize network resource usage by selecting paths based on connection traffic parameters and link load information. However, effective path-selection schemes require the distribution of link-state information, which can cause a significant burden on the bandwidth and processing resources in the network. We investigate the fundamental tension between network overheads and the quality of routing decisions in the context of source-directed QoS routing algorithms. In contrast to previous performance studies that compare different routing algorithms under specific network configurations, we characterize how the performance and overheads of QoS routing relate to the link-state update policies, as a function of the underlying traffic load, network topology, and link-cost metrics. We explore the interplay between stale link-state information and random fluctuations in traffic load through a broad set of simulation experiments on a parameterized model of QoS routing. These results suggest ways to tune the frequency of link-state update messages and the link-cost functions to strike a careful balance between high accuracy and low complexity.

CSE-TR-351-97 Stateful Multicast Services for Supporting Collaborative Applications Shim Hall Litiu and Prakash Nov, 97 14
Collaborative, multi-user applications require group multicast services with ordering guarantees for maintaining consistency of replicated shared state among collaborating processes. In traditional group multicast services (such as Isis), the groups shared state is maintained only by the clients and the multicast service facilitates atomic state transfer from existing clients to new members. In this paper, we argue that, in order to support collaborative applications in Internet-type environments, a copy of the groups state should also be maintained by the multicast service. We show that by maintaining a copy of groups state, the multicast service can provide consistently fast group-join and state transfer times when both slow and fast clients are present or when clients are unreliable --- a crucial requirement in collaborative, multi-user applications where users may dynamically join and leave a collaborative session and expect predictable join times and interactive response time even in the presence of slow or unreliable clients. We show that the overheads incurred by a multicast service in managing each groups shared state can be made minimal and that the multicast service does not have to be aware of the semantics of the groups state. We present the design of such a multicast service, present performance results, and discuss how it meets the various needs of collaborative applications.

CSE-TR-352-97 Individual and Group QoS Issues in Communication Services for Groupware Systems Litiu and Prakash Nov, 97 14
The proliferation of computer networks in the last decade and the ubiquity of the World Wide Web have led to increased interest in the development of computer-supported cooperative work (CSCW) systems. Collaborative, multi-user applications require group multicast services that provide ordering guarantees for maintaining consistency of replicated shared context as well as provide a high degree of interactivity, even under varying load on the multicast servers. While the most common view of the quality of service (QoS) in a distributed system is in terms of the guarantee of the network connection parameters (bandwidth, end-to-end delay), in this paper we investigate QoS from the perspective of the various requirements placed on group communication servers with limited resources by multiple and diverse groups of collaborative users. We show that in the absence of QoS considerations in the design of a group communication service, some groups or individual users can be severely affected by bursty traffic or increase in the size of other groups. We present the design of a best-effort QoS-based adaptive group communication service for supporting reliable data communication in CSCW systems. Our QoS considerations address both groups requirements and individual users requirements. We present performance results showing the effectiveness of the approach and discuss some of the open issues for future work.

CSE-TR-353-97 Scalable Flow Control for Multicast ABR Services in ATM Networks Zhang Shin Saha Kandlur Nov, 97 25
We propose an efficient flow-control scheme for ATM ABR multicast services. We develop a second-order rate-control algorithm to deal with the variation in RM (Resource Management) cell feedback delay resulting from dynamic ``drift' of the bottleneck location within a multicast tree. The proposed scheme makes the rate process converge to the available bandwidth of the multicast-connections most congested link. It also confines the buffer occupancy to a target regime bounded by a given (finite) buffer capacity at the bottleneck node. Using fluid approximation, we model the proposed scheme and study the system dynamics under the most stressful traffic conditions. We derive expressions for queue buildups and average throughputs in both transient and equilibrium states. We identify the system control factors that govern the system dynamics and develop an optimal control condition which guarantees monotonic convergence of the system state to the target regime from an arbitraryinitial value.

CSE-TR-354-97 Integrated Rate and Credit Based Flow Control for Unicast ABR Service in ATM Networks Zhang Shin Zheng Nov, 97 33
We propose a flow-control scheme that combines the merits of credit- and rate-based flow-control schemes by applying {it direct/} control over {it both} bandwidth and buffer resources. The goal of the proposed scheme is to design an optimal rate-control policy for a given finite buffer capacity by %that maximizing average throughput and bounding end-to-end delay. By applying the second-order rate control, the proposed scheme not only makes the rate process converge to the neighborhood of link bandwidth, but also confines the queue-length fluctuation to a regime bounded by buffer capacity. Using hop-by-hop credit flow control, the proposed scheme ensures lossless transmission with a finite buffer capacity while maintaining high network utilization. The source dynamically adjusts its transmission rate and rate-control parameters according to both the network congestion status and whether or not the system states are in the target operating regime. Using the fluid approximation method, we model the proposed flow-control scheme and study the system dynamic behavior for ABR (Available Bit Rate) service under the most stressful traffic condition. We derive the expressions for queue build-ups and average throughput in the equilibrium states as functions of rate-control parameters, feedback delay, link bandwidth, and buffer capacity. Based on the analytical results, we identify the optimal equilibrium state and propose the second-order rate control algorithm for transient-state flow control. We derive a sufficient condition which guarantees the second-order rate control to drive the system from any transient state to an optimal equilibrium state. The worst-case performance bounds, derived as the closed-form functions of flow-control parameters, provide an analytical means of evaluating the performance of the second-order rate control in terms of convergence speed and buffer utilization. The analytical results %in both transient and equilibrium states for both single and multiple connection scenarios have shown the proposed scheme to be stable and efficient in that the source rate and bottleneck queue length rapidly converge to a small neighborhood of the designated operating point. Also, presented are examples showing that the proposed scheme outperforms the other existing schemes.

CSE-TR-355-97 The Semantics of the Java Programming Language: Preliminary Version Wallace Dec, 97 89
A mathematical model of the Java programming language is constructed.

Technical Reports Page