CSE Technical Reports Sorted by Technical Report Number

TR Number Title Authors Date Pages

CSE-TR-356-98 Real-Time Primary-Backup (RTPB) Replication with Temporal Consistency Guarantees Zou and Jahanian Feb, 98 20
A common approach to building fault-tolerant distributed systems is to replicate servers that fail independently. The objective is to give the clients the illusion of service that is provided by a single server. The main approaches for structuring fault-tolerant servers are emph{passive} and emph{active} replication. This paper presents a primary-backup (passive) replication scheme for supporting fault-tolerant real-time applications. The proposed scheme, called Real-Time PB (RTPB) replication service, is an elegant and simple approach that provides the benefits of fault-tolerance, real-time access, and temporal consistency guarantees that are not otherwise easily attainable. This paper formally defines two types of temporal consistency, namely emph{external temporal consistency} and emph{inter-object temporal consistency}. By introducing a key concept called emph{phase variance}, we are able to build our temporal consistency models and derive necessary and sufficient conditions that can be used as the basis for update and transmission scheduling that achieve temporal consistency guarantees. Furthermore, we prove that the term emph{phase variance} used in the models can be bounded under various scheduling algorithms, namely EDF, Rate Monotonic~cite{Liu:73}, and Distance-Constrained Scheduling~cite{Han:92}. The paper also presents an implementation of the real-time primary-backup replication scheme with the aforementioned temporal consistency models. This implementation was developed within the $x$-kernel architecture on the MK 7.2 microkernel from the Open Group. The results of a detailed performance evaluation of this implementation is also discussed.

CSE-TR-357-98 Configuration Independent Analysis for Characterizing Shared-Memory Applications Abandah and Davidson Jan, 98 26
Characterizing shared-memory applications provides insight to design efficient systems, and provides awareness to identify and correct application performance bottlenecks. Configuration dependent analysis is often used to simulate detailed application traces on a particular hardware model. The communication traffic and computation workload generated by the application trace is used as a characterization of this application. This paper demonstrates that configuration independent analysis is a useful tool to characterize shared-memory applications. Configuration independent analysis characterizes inherent application characteristics that do not change from one configuration to another. While configuration dependent analysis is repeated for each target configuration, configuration independent analysis is only performed once. Moreover, configuration independent analysis does not require developing models for the target configurations and is faster than detailed simulation. However, configuration dependent analysis directly provides more information about specific configurations. A combination of the two analysis types constitutes a comprehensive and efficient methodology for characterizing shared-memory applications. In this paper, we show how configuration independent analysis is used to characterize eight aspects of application behavior: general characteristics, working sets, concurrency, communication patterns, communication variation over time, communication slack, communication locality, and sharing behavior. We illustrate the advantages and limitations of this approach by analyzing eight case-study benchmarks from the scientific and commercial domains and interpreting the results.

CSE-TR-358-98 The DiskSim Simulation Environment Version 1.0 Reference Manual Ganger Worthington and Patt Feb, 98 53
DiskSim is an efficient, accurate and highly-configurable disk system simulator developed at the University of Michigan to support research into various aspects of storage subsystem architecture. It includes modules that simulate disks, intermediate controllers, and buses, device drivers, request schedulers, disk block caches, and disk array data organizations. In particular, the disk drive module simulates modern disk drives in great detail and has been carefully validated against several productions disks (with accuracy that exceeds any previously reported simulator). This manual describes how to configure and use DiskSim, which has been made publicly available with the hope of advancing the state-of-the-art in disk system performance evaluation in the research community. The manual also briefly describes DiskSim’s internal structure and various validation results.

CSE-TR-359-98 Constructive Multilevel Logic Synthesis Under Properties of Boolean Algebra Kravets and Sakallah Mar, 98 14
We describe a new constructive multilevel logic synthesis system that integrates the traditionally sepa rate technology-independent and technology-dependent stages of modern synthesis tools. Dubbed M32, this system is capable of generating circuits incrementally based on both functional as well as structural considerations. This is achieved by maintaining a dynamic structural representation of the evolving implementation and by refining it through progressive introduction of gates from a target technology library. Circuit construction proceeds from the primary inputs towards the primary out puts. Preliminary experimental results show that circuits generated using this approach are generally superior to those produced by multi-stage synthesis.

CSE-TR-360-98 Block Enlargement Optimizations for Increasing the Instruction Fetch Rate in Block-Structured Instruction Set Architectures Hao Mar, 98 129
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs are a new class of instruction set architectures that we designed to address the performance obstacles faced by processors attempting to exploit high levels of instruction level parallelism. The major distinguishing feature of a block-structured ISA is that it defines the architectural atomic unit (i.e. the instruction)( to be a group of operations which is called an atomic block. This dissertation defines an optimization, block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. A compiler that generates block-structured ISA processor were constructed to evaluate the performance benefit of block-structured ISAs. This dissertation shows that for the SPECint95 benchmarks, the block-structured ISA processor executing enlarged atomic blocks and using simpler microarchitectural mechanisms to support wide-issue and dynamic scheduling out performs a conventional ISA processor that also supports wide-issue and dynamic scheduling by 28% when assuming perfect branch prediction and by 15% when using real branch prediction.

CSE-TR-361-98 Improving Performance of an L1 Cache With an Associated Buffer Srinivasan Mar, 98 30
Memory access latencies are much larger than processor cycle times, and the trend has been for this gap to increase over time. Cache performance becomes critical in bridging this gap. However, since it is difficult to make a cache both large and fast, cache misses are expected to continue to have a significant performance impact. Victim caching, proposed by Jouppi, is an approach to decrease the miss ratio of direct-mapped caches without affecting their access time. NTS caching, proposed by Rivers is a multilateral cache design scheme that improves performance of first-level(L1) caches based on the temporal locality of the reference pattern. We propose an improvement of these schemes, which we call NT-victim caching. Taking the lead from NTS design we have a bilateral L1 cache, having a main cache (cache A) and a small fully associative buffer (cache B). Cache B is similar to a victim buffer, and holds the victim block replaced by a miss. A cache block is temporal if after it is brought into cache, some word in that block is accessed more than once before the block is replaced. Unlike Victim caches a block that is hit in cache B is swapped with a block in cache A only if it is temporal, and in most of our replacement strategies temporal blocks are less likely to be selected for replacement than non-temporal blocks. Every cache block loaded into L1 cache is monitored for temporal behavior by a hardware detection unit. Our results show that this design reduces the number of swaps between cache A and cache B, relative to the Victim cache, yet gives a comparable miss ratio.

CSE-TR-362-98 Reducing Communication Cost in Scalable Shared Memory Systems Abandah Apr, 98 162
Distributed shared-memory systems provide scalable performance and a convenient model for parallel programming. However, their non-uniform memory latency often makes it difficult to develop efficient parallel applications. Future systems should reduce communication cost to achieve better programmability and performance. We have developed a methodology, and implemented a suite of tools, to guide the search for improved codes and systems. As the result of one such search, we recommend a remote data caching technique that significantly reduces communication cost. We analyze applications by instrumenting their assembly-code sources. During execution, an instrumented application pipes a detailed trace to configuration independent (CIAT) and configuration dependent (CDAT) analysis tools. CIAT characterizes inherent application characteristics that do not change from one configuration to another, including working sets, concurrency, sharing behavior, and communication patterns, variation over time, slack, and locality. CDAT simulates the trace on a particular hardware model, and is easily retargeted to new systems. CIAT is faster than detailed simulation; however, CDAT directly provides more information about a specific configuration. The combination of the two tools constitutes a comprehensive and efficient methodology. We calibrate existing systems using carefully designed microbenchmarks that characterize local and remote memory, producer-consumer communication involving two or more processors, and contention when multiple processors utilize memory and interconnect. This dissertation describes these tools and illustrates their use by characterizing a wide range of applications and assessing the effects of architectural and technological advances on the performance of HP/Convex Exemplar systems, evaluates strengths and weaknesses of current system approaches, and recommends solutions. CDAT analysis of three CC-NUMA system approaches shows that current systems reduce communication cost by minimizing either remote latency or remote communication frequency. We describe four architecturally varied systems that are technologically similar to the low remote latency SGI Origin 2000, but incorporate additional techniques for reducing the number of remote communications. Using CCAT, a CDAT-like simulator that models communication contention, we evaluate the worthiness of these techniques. Superior performance is reached when processors supply cached clean data, or when a remote data cache is introduced that participates in the local bus protocol.

CSE-TR-363-98 mlcache: A Flexible Multi-Lateral Cache Simulator Tam Rivers Tyson and Davidson May, 98 10
As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. To address this, processor designers typically design for the largest possible caches that can still remain on the ever growing processor die. However, alternate, multi-lateral cache designs such as the Assist Cache, Victim Cache, and NTS Cache have been shown to perform as well as or better than larger, single structure caches while requiring less die area. For a given die size, reducing the requirements to attain a given rate of data supply can allow more space dedicated for branch prediction, data forwarding, increasing the size of the reorder buffer, etc. Current cache simulators are not able to study a wide variety of multi-lateral cache configurations. Thus, the mlcache multi-lateral cache simulator was developed to help designers in the middle of the design cycle decide which cache configuration would best aid in attaining the desired performance goals of the target processor. mlcache is an event-driven, timing-sensitive simulator based on the Latency Effects cache timing model. It can be easily configured to model various multi-lateral cache configurations using its library of cache state and data movement routines. The simulator can be easily joined to a wide range of event-driven processor simulators such as RCM_brisc, Talisman, SimICS, and SimpleScalar. We use the SPEC95 benchmarks to illustrate how mlcache can be used to compare the performance of several different data cache configurations

CSE-TR-364-98 Transistor Level Micro Placement and Routing for Two-Dimensional Digital VLSI Cell Synthesis Riepe and Sakallah Jun, 98 16
Previous research into the problem of cell library synthesis for digital VLSI design has concentrated mostly on relatively simple 1-dimensional cell topologies for static CMOS designs. Recent interest has emerged in less constrained 2-dimensional topologies to support more complex non-dual circuits such as latches and flip flops, as well as high performance circuit families such as CVSL, PTL, and domino CMOS. We discuss a CAD methodology which supports a generalized placement and routing approach to the realization of mask geometry for such complex circuits. We explore the options available within this methodology, show how the transistor level placement and routing problems at the transistor level differ from those at the block level, and present some results for a prototype tool, TEMPO, which adopts this methodology.

CSE-TR-365-98 A Scalable Flow-Control Algorithm for Point-to-Multipoint Communications in High-Speed Integrated Networks Zhang and Shin Jun, 98 32
We propose and study a scalable flow-control algorithm for multicast ABR (Available Bit Rate) service. A key idea behind the algorithm is the Soft-Synchronization Protocol (SSP), which derives a single ``consolidated' RM (Resource Management) cell at each multicast branch-point from feedback RM-cells of different downstream branches that are not necessarily responses to the same forward RM-cell in each synchronization cycle. Using balanced and unbalanced binary-tree models, we analyze the scalability of SSP in terms of height and structure of the multicast tree. In contrast with the other existing schemes, SSP is shown to be able to effectively support synchronization of feedback RM-cells and make the effective RM-cell roundtrip delay virtually independent of the multicast-trees height and structure, thus making it scalable. Another key problem in multicast flow-control is how to deal with the variation of RM-cell roundtrip delay due to the randomly drifting bottleneck in a multicast tree. This problem is handled by adapting the second-order rate-control parameter to roundtrip-delay variations. A fluid model is used to analyze the second-order rate control and study the system dynamics for multicast ABR service. We derive an analytical relationship between the second-order rate parameter and RM-cell roundtrip delay subject to both lossless transmission and finite buffer capacity constraints. This relationship ensures the feasibility of the second-order rate control in dealing with RM-cell roundtrip-delay variations and provides an insight on how the required buffer space can be controlled by adjusting the rate parameter. We develop an optimal control condition, under which the second-order rate control guarantees monotonic convergence of the system state to the optimal regime from an arbitrary initial value. The proposed second-order rate-control algorithm is also shown to be feasible and optimal in buffer-allocation efficiency and fairness at the bottleneck.

CSE-TR-366-98 A Scalable Key Distribution Hierarchy McDaniel and Jamin Jul, 98 13
As the use of the Internet for electronic commerce, audio and video conferencing, and other applications with sensitive content grows, the need for secure services becomes critical. Central to the success of these services is the support for secure public key distribution. Although there are several existing services available for this purpose, they are not very scalable, either because they depend on a centralized server or rely on ad hoc trust relationships. In this paper, we present and examine a flexible approach to certificate distribution scalable to arbitrarily large networks. We propose a two level hierarchy where certificates can be independently authenticated by one or more peer authorities, called keyservers. Certificates for end-user and host entities are managed within local domains, called enterprises. By administering certificates close to the source, we reduce the load on the key servers and the effects of network topology changes. We describe the design of our system and present a preliminary performance analysis based on traces of present-day DNS requests.

CSE-TR-367-98 Optimization of a Real-Time Primary-Backup Replication Service Zou and Jahanian Jul, 98 20
The primary-backup replication model is one of the commonly adopted approaches to providing fault tolerant data services. The extension of the primary-backup protocol to the real-time environment, however, imposes the additional constraint of timing predictability, which requires a bounded overhead for managing redundancy. There is a trade-off between reducing system overhead and increasing (temporal) consistency between the primary and backup, and the way to achieve balance between them is not so obvious. In this paper, we try to explore ways to optimize scheduling update messages from primary to backup while maintaining the temporal consistency guarantees of the system. Depending on the purpose of the individual replication service and the environment in which it is running, very different objectives can be sought in the process of optimization. This paper considers the optimization problem from two perspectives with one aimed to minimize the average temporal distance between the primary and backup, and the other aimed to minimize the resource being used in maintaining a given temporal constraint on the system. Corresponding optimization techniques have been developed for these two diverse objectives and an implementation of such techniques is also presented. The implementation is built on top of the existing RTPB model developed in citation Zou2:98 which in turn was developed within the x-kernel architecture on the Mach OSF platform running MK (mach kernel) 7.2. Results of an experimental evaluation of the proposed optimization techniques are also discussed.

CSE-TR-368-98 Origins of Internet Routing Instability Labovitz Malan and Jahanian Jul, 98 20
This paper examines the network routing messages exchanged between core Internet backbone routers. Internet routing instability, or the rapid fluctuation of network reachability information, is an important problem currently facing the Internet engineering community. High levels of network instability can lead to packet loss, increased network latency and time to convergence. At the extreme, high levels of routing instability have led to the loss of internal connectivity in wide-area, national networks. In an earlier study of inter-domain routing, we described widespread, significant pathological behaviors in the routing information exchanged between backbone service providers at the major U.S. public Internet exchange points. These pathologies included several orders of magnitude more routing updates in the Internet core than anticipated, large numbers of duplicate routing messages, and unexpected frequency components between routing instability events. The work described in this paper extends our earlier analysis by identifying the origins of several of these observed pathological Internet routing behaviors. We show that as a result of specific router vendor software changes suggested by our earlier analysis, the volume of Internet routing updates has decreased by an order of magnitude. We also describe additional router software changes that can decrease the volume of routing updates exchanged in the Internet core by an additional 30 percent or more. We conclude with a discussion of trends in the evolution of Internet architecture and policy that may lead to a rise in Internet routing instability.

CSE-TR-369-98 Fixed-Choice and Independent-Choice Logics Blass and Gurevich Aug, 98 55
We study extensions of first-order logic with the choice construct (choose x : phi(x)). We prove some results about Hilberts epsilon operator, but in the main part of the paper we consider the case when all choices are independent.

CSE-TR-370-98 Malleable Shared Workspaces to Support Multiple Usage Paradigms Lee and Prakash Aug, 98 10
Several recent systems provide a room-based metaphor to represent shared workspaces that require use of multiple collaborative tools. These systems provide users with a fairly static usage paradigm of room-centered collaboration, requiring users to mold their collaborative activities to the paradigm rather than molding the paradigm to fit the requirements of their collaborative activities. In this paper, we propose a powerful and yet simple event-action based model, augmented with access control and multi-user features, for room-based systems for providing a high degree of malleability so that these systems can be adapted to provide support for a variety of collaborative facilites, such as call centers, mailboxes, shared repositories, and role-based collaboration, including facilities that are not necessarily anticipated by system designers. The model can be used by both system developers as well as by end-users to customize a system to meet the requirements of their group tasks.

CSE-TR-371-98 Specification and Verification of Pipelining in the ARM2 RISC Microprocessor Huggins and Van Campenhout Aug, 98 40
Gurevich Abstract State Machines (ASMs) provide a sound mathematical basis for the specification and verification of systems. An application of the ASM methodology to the verification of a pipelined microprocessor (an ARM2 implementation) is described. Both the sequential execution model and final pipelined model are formalized using ASMs. A series of intermediate models are introduced that gradually expose the complications of pipelining. The first intermediate model is proven equivalent to the sequential model in the absence of structural, control, and data hazards. In the following steps, these simplifying assumptions are lifted one by one, and the original proof is refined to establish the equivalence of each intermediate model with the sequential model, leading ultimately to a full proof of equivalence of the sequential and pipelined models.

CSE-TR-372-98 Assembly as a Noncooperative Game of its Pieces: Analysis of 1D Sphere Assemblies Bozma and Koditschek Aug, 98 33
We propose an event-driven algorithm for simple robot assembly problems based on the noncooperative game theory. We examine rigorously the simplest setting-three bodies with one degree of fredom ands offer extensive simulation for the 2 DOF extension. The intial analysis and the accompanying simulations suggest that this approach may indeed offer an attractive means of building robust event driven assembly systems.

CSE-TR-373-98 Performance Limits of Trace Caches Postiff Tyson and Mudge Sep, 98 15
A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically noncontiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-ported cache structure. This paper compares trace cache performance to the theoretical limit of a three-block fetch mechanism equivalent to an idealized 3-ported instruction cache with a perfect alignment network. Several new metrics are defined to formalize analysis of the trace cache. These include fragmentation, duplication, indexability, and efficiency metrics. We show that performance is more limited by branch mispredictions than ability to fetch multiple blocks per cycle. As branch prediction improves, high duplication and the resulting low efficiency are shown to be among the reasons that the trace cache does not reach its upper bound. Based on the shortcomings of the trace cache discovered in this paper, we identify some potential future research areas.

CSE-TR-374-98 Optimal Few-Stage Allocation Procedures Hardwick and Stout Sep, 98 18
This paper gives very general algorithms for the design of optimal experiments involving two Bernoulli populations in which sampling is carried out in stages. It is assumed that the outcomes of the previous stage are available before the allocations for the next stage are decided. At each stage, one must decide how many observations to take and how many to sample from each of the alternative populations. Of particular interest are 2- and 3-stage experiments. To illustrate that the algorithms can be used for experiments of useful sample sizes, they are applied to estimation and optimization problems. Results indicate that, for problems of moderate size, published asymptotic analyses do not always represent the true behavior of the optimal stage sizes, and efficiency may be lost if the analytical results are used instead of the true optimal allocation. Our results also suggest that one might approach large problems by extrapolating optimal solutions for moderate sample sizes; and, that approaches of this sort could give design guidelines that are far more explicit (and hopefully closer to optimal) than those obtained through asymptotic analyses alone. The examples also show the ease with which the computational approach can solve problems that present analytical difficulties. This allows one to use models that more accurately represent important characteristics of the problem. It is also shown that in various cases the base algorithms can be modified to incorporate simplifications. In such settings, significant speedups and space reductions can be obtained, permitting the exact solution of even larger problems.

CSE-TR-375-98 Realizing Services for Guaranteed-QoS Communication on a Microkernel Operating System Mehra Shaikh Abdelzaher Wang and Shin Sep, 98 27
Provision of end-to-end QoS guarantees on communication necessitates appropriate support in the end systems (i.e., hosts) and network routers that form the communication fabric. Typically, the support is in the form of suitable extensions to the communication subsystem and the underlying operating system for specification and maintenance of QoS guarantees. This paper focuses on the architectural and implementation challenges involved in realizing QoS-sensitive host communication subsystems on contemporary microkernel operating systems with limited real-time support. We motivate and describe the components constituting our integrated service architecture that together ensure QoS-sensitive handling of network traffic at both sending and receiving hosts. We separate the policies from mechanisms in each component, demonstrating a communication framework that can implement alternative QoS models by applying appropriate policies. We also report the results of a detailed execution profile of the system to characterize communication costs for the purposes of admission control. An experimental evaluation in a controlled configuration demonstrates the efficacy with which QoS guarantees are maintained, despite limitations imposed by the underlying operating system.

CSE-TR-376-98 Windowed Key Revocation in Public Key Infrastructures McDaniel and Jamin Oct, 98 14
A fundamental problem inhibiting the wide acceptance of a Public Key Infrastructure (PKI) in the Internet is the lack of a mechanism that provides scalable certificate revocation. In this paper, we propose a novel mechanism called Windowed Revocation. In windowed revocation, certificate revocation is announced for short periods in periodic Certificate Revocation Lists (CRLs). Due to the assurances provided by the protocol over which certificates are retrieved, we bound the amount of time that any certificate is cached by users. Thus, we can limit the announcement of revocation only to the time in which the certificate may be cached; not until its expiration. Because the time in which certificate are announced is short, CRLs are similarly small. By limiting the size of CRLs, we are able to integrate other mechanisms that increase the scalability of the PKI. One such mechanism is the use of ``pushed' CRLs using multicast. We include a proof of the correctness of our approach.

CSE-TR-377-98 A Program for Sequential Allocation of Three Bernoulli Populations Hardwick Oehmke and Stout Oct, 98 14
We describe a program for optimizing and analyzing sequential allocation problems involving three Bernoulli populations. Previous researchers had considered this problem computationally intractable, and we know of no prior exact optimizations for such problems, even for very small sample sizes. Despite this, our program is currently able to solve problems of size 200 or more by using a parallel computer, and problems of size 100 on a workstation. We describe the program and the techniques used to enable it to scale to large sample sizes. As an illustrative example, the program is used to create an adaptive sampling procedure that is the optimal solution to a 3-arm bandit problem. The bandit procedure is then compared to two other allocation procedures along various frequentist and Bayesian metrics. We also indicate extensions of the program that enable it to solve a variety of related problems. Keywords: multi-arm bandit, parallel computing, dynamic programming, adaptive allocation, sequential sampling, clinical trial, load balancing, high-performance computing, recursive equations, design of experiments, statistics

CSE-TR-378-98 Modeling Dual-Task Performance Improvement: Casting Executive Process Knowledge Acquisition as Strategy Refinement Chong Oct, 98 152
People demonstrate a remarkable ability to perform complex, multiple-task activities in spite of the limitations of our sensory, perceptual, cognitive, and motor systems. A prominent theory that addresses how multiple-tasks activities are performed is that of the executive process. Some of the functions of the executive process include enforcing task priorities and arbitrating access to limited resources. It has been shown that a time-sharing skill (or executive-process knowledge) is acquired during training on dual-task combinations. This dissertation presents the development of a computational, task-independent framework for modeling the acquisition of the knowledge acquired during training on dual-task combinations-— executive process knowledge. On a selected dual-task combination—-a continuous tracking task and a discrete two-choice reaction time task-—this framework, when given the declarative and procedural representation of the novice task, has produced an expert model whose performance is a good match to empirical reaction time and tracking error data for the task combination. There are three main contributions of this work. First is the development of EPIC-Soar, a symbolic hybrid architecture that possesses a psychologically-motivated learning mechanism and psychologically-plausible perception and motor systems. Second is the identification and classification of executive process knowledge and the taxonomies that result from this analysis. Third, is an acquisition framework which consists of: a novel data structure for representing task strategies; a task-independent procedure for resolving simultaneous access for motor resources and learning new knowledge that avoids such collisions in the future; a second task-independent learning procedure which refines the strategy data structure and creates new procedural knowledge for performing the task; and a collection of guidelines that regulate how and when promotions are applied.

CSE-TR-379-98 Ensuring Reasoning Consistency in Hierarchical Architectures Wray Nov, 98 149
Agents often dynamically decompose a task into a hierarchy of subtasks. Hierachical task decomposition reduces the cost of knowledge design in comparison to non-hierarchical knowledge because knowledge is simplified and can be shared across multiple tasks. However, hierarchical decomposition does have limitations. In particular, hierarchical decomposition can lead to inconsistency in agent processing, resulting potentially in irrational behavior. Further, an agent must generate the decomposition hierarchy before reacting to an external stimulus. Thus, decomposition can also reduce agent responsiveness. This thesis describes ways in which the limitations of hierarchical decomposition can be circumvented while maintaining inexpensiveknowledge design and efficient agent processing. We introduce Goal-oriented Heuristic Hierarchical Consistency (GOHHC), which eliminates across-level inconsistency. GOHHC computes logical dependencies in asserted knowledge between goals rather than individual assertions, thus avoiding the computational expense of maintaining dependencies for all assertions. Although this goal-oriented heuristic ensures consistency, it does sometime lead to unnecessary repetition in reasoning and delay in task execution. We show empirically that these drawbacks are inconsequential in execution domains. Thus, GOHHC provides an efficient guarantee of processing consistency in hierarchical architectures. Ensuring consistency also provides an architectural framework for unproblematic knowledge compilation in dynamic domains. Knowledge compilation can be used to cache hierarchical reasoning and thus avoid the delay in reaction necessitated by decomposition. An empirical investigation of compilation in hierarchical agents shows that compilation can lead to improvement in both overall performance and responsiveness, while maintaining the low design cost of hierarchical knowledge.

CSE-TR-380-98 Code Compression for DSP Lefurgy and Mudge Nov, 98 5
Previous works have proposed adding compression techniques to a variety of architectural styles to reduce instruction memory requirements. It is not immediately clear how these results apply to DSP architectures. DSP instructions are longer and have potentially greater variation which can decrease compression ratio. Our results demonstrate that DSP programs do provide sufficient repetition for compression algorithms. We propose a compression method and apply it to SHARC, a popular DSP architecture. Even using a very simple compression algorithm, it is possible to halve the size of the instruction memory requirements. Keywords: Compression, Code Density, Code Space Optimization, DSP, Embedded Systems

CSE-TR-381-98 Evaluating the Overheads of Source-Directed Quality-of-Service Routing Shaikh Rexford and Shin Dec, 98 25
Quality-of-service (QoS) routing satisfies application performance requirements and optimizes network resource usage by selecting paths based on connection traffic parameters and link load information. However, effective path-selection schemes require the distribution of link-state information, which can impose a significant burden on the bandwidth and processing resources in the network. We investigate the fundamental trade-off between network overheads and the quality of routing decisions in the context of the source-directed link-state routing protocols proposed for future IP and ATM networks. In contrast to previous work that compares different routing algorithms under specific network configurations, we construct a detailed model of QoS routing that parameterizes the path-selection algorithm, link-cost function, and link-state update policy. Through extensive simulation experiments with several representative network topologies and traffic patterns, we uncover the effects of stale link-state information and random fluctuations in traffic load on the routing and signalling overheads. We then investigate how the inaccuracy of link-state information interacts with the size and connectivity of the underlying topology. Finally, we show that by tuning the coarseness of the link-cost metric to the inaccuracy of underlying link-state information we can reduce the computational complexity of the path-selection algorithm without significantly degrading performance. The paper concludes by summarizing our key results as a list of guidelines for designing efficient quality-of-service routing policies in large backbone networks.

CSE-TR-382-98 Experimental Study of Internet Stability and Wide-Area Backbone Failures Labovitz Ahuja and Jahanian Dec, 98 22
In this paper, we describe an experimental study of Internet stability and the origins of failure in Internet protocol backbones. The stability of end-to-end Internet paths is dependent both on the underlying telecommunication switching system, as well as the higher level software and hardware components specific to the Internets packet-switched forwarding and routing architecture. Although a number of earlier studies have examined failures in the public telecommunication system, little attention has been given to the characterization of Internet stability. Our paper analyzes Internet failures from three different perspectives. We first examine several recent major Internet failures and their probable origins. These empirical observations illustrate the complexity of the Internet and show that unlike commercial transaction systems, the interactions of the underlying components of the Internet are poorly understood. Next, our examination focuses on the stability of paths between Internet Service Providers. Our analysis is based on the experimental instrumentation of key portions of the Internet infrastructure. Specifically, we logged all of the routing control traffic at five of the largest U.S. Internet exchange points over a three year period. This study of network reachability information found unexpectedly high levels of path fluctuation and an aggregate low mean time between failures for individual Internet paths. These results point to a high level of instability in the global Internet backbone. While our study of the Internet backbone identifies major trends in the level of path instability between different service providers, these results do not characterize failures inside the network of service provider. The final portion of our paper focuses on a case study of the network failures observed in a large regional Internet backbone. This examination of the internal stability of a network includes twelve months of operational failure logs and a review of the internal routing communication data collected between regional backbone routers. We characterize the type and frequency of failures in twenty categories, and describe the failure properties of the regional backbone as a whole.

Technical Reports Page