CSE Technical Reports Sorted by Technical Report Number

TR Number Title Authors Date Pages

CSE-TR-419-00 ASH-3: A Distributed File System Over Distributed Shared Memory Janjoom Raasch Shih Jan, 00 9
The use of distributed file systems (DFS) has been popularized by systems such as AFS and NFS. They allow users to share system resources from any location in the network. They also increase the reliability and availability of these resources in that underlying changes are transparent to the user. Concurrently, much research has been undergoing in the area of distributed-shared memory (DSM). The concept of DSM is similar to that of DFS, in that it can allow many users to share a common base of resources. In DSM, however, the resource is not disk storage, but rather memory. Like DFS, DSM allows all users to have the same view of the resource that is transparent when moving between machines. Because these two systems are so similar, we combine the power of a DFS with the ease of development available from a DSM system. We quantify the performance lost by using the DSM layer and will also demonstrate the gains in ease of development.

CSE-TR-420-00 Generalized Symmetries in Boolean Functions Kravets and Sakallah Feb, 00 66
In this report we take a fresh look at the notion of symmetries in Boolean functions. In particular, we show that the classical characterization of symmetries based on invariance under variable swaps is a special case of a more general invariance based on unrestricted variable permutations. We propose a generalization of classical symmetry that allows for the simultaneous swap of groups of variables and show that it captures more of a functionís invariant permutations without undue computational requirements. We apply the new symmetry definition to analyze a large set of benchmark circuits and provide extensive data showing the existence of substantial symmetries in those circuits. Specific case studies of several of these benchmarks reveal additional insights about their functional structure and how it might be related to their circuit structure.

CSE-TR-421-00 The MIRV SimpleScalar/PISA Compiler Postiff Greene Lefurgy Helder and Mudge Apr, 00 23
We introduce a new experimental C compiler in this report. The compiler, called MIRV, is designed to enable research that explores the interaction between the compiler and microarchitecture. This introductory paper makes comparisons between MIRV and GCC. We notice trends between the compilers and optimization levels across SPECint1995 and SPEC2000. Finally, we provide a set of SimpleScalar/PISA binaries to the research com-munity. As we improve the compiler, we encourage architecture researchers to use these optimized binaries as reference programs for architecture research.

CSE-TR-422-00 The Construction of Attention Functions for Phase Regulation Klavins and Koditschek Apr, 00 9
In this report we describe a method for building an attention function on the two dimensional torus. Such a function is used as an analytic switch by an underactuated robotic system to change its attention from one controlled cyclic process to another. A point in the torus represents two phase variables which each describe the progress of a cyclic system around a circle, such as that of a bouncing ball or a hopping leg. The attention function is designed so that under a certain flow on the torus, attention is paid to the phase which will next become zero.

CSE-TR-423-00 An Automatic Verification Tool for UML Compton Gurevich Huggins and Shen May, 00 49
The Unified Modeling language is becoming more and more popular in the software development. However because of its ambiguisity in its semantic model, few verification tool has been built. Abstract State Machines have been successfully applied in giving semantics for programming language like C. In this report, we try to use the Abstract State Macines to give a semantics model for UML and then use ASM Model Checker to design a verification tool for UML. Last we give a toy example to show how the verification works.

CSE-TR-424-00 A Prefetch Taxonomy Srinivasan Davidson and Tyson Apr, 00 23
The difference in processor and main memory cycle time has necessitated the use of aggressive prefetching techniques to reduce or hide main memory access latency. However, prefetching can significantly increase memory bandwidth and unsuccessful prefetches may even pollute the primary cache. Although the current metrics, coverage and accuracy, do provide an impression of the overall quality of a prefetching algorithm, they are simplistic and do not unambiguously and precisely explain the performance of a prefetching algorithm. In this paper, we introduce a new and complete taxonomy for classifying prefetches, accounting for the difference in traffic and misses generated by each prefetch. Our classification of individual prefetches is very useful in identifying the specific strengths and weaknesses of an existing prefetch implementation and we demonstrate it by applying our taxonomy to two implementable prefetching techniques. Based on the histogram of prefetches by category we developed a static filter to reduce/eliminate polluting prefetches of any given prefetching technique.

CSE-TR-425-00 SpliCS-Split Latency Cache System Srinivasan Charney Davidson and Tyson Apr, 00 19
Memory access latencies are much larger than processor cycle times, and this gap has been increasing over time. Cache performance is critical to bridging this gap. Since caches cannot be both large and fast, cache design involves hierarchies with trade-offs between access times and miss ratios. This paper introduces a novel primary cache design, called the Split Latency Cache System (SpliCS). SpliCS employs two data stores: a small, fast cache (A) and a larger, but slower cache (B), inclusion is maintained and both caches are accessed in parallel. SpliCS improves the effectiveness of the cache hierarchy by allowing very fast access to some lines while still allowing a large primary cache to be used. Our results using an out-of-order processor model show that relative to a similarly configured traditional cache, SpliCS achieves a 12 to 13% improvement in CPI on commercial applications and 14 to 18% improvement in CPI on some benchmarks from the SPEC suite.

CSE-TR-426-00 Antigone: Implementing Policy in Secure Group Communication McDaniel and Prakash May, 00 33
Significant strides have been made in achieving strong semantics and security guarantees within group communication and multicast systems. However, the scope of available security policies in these systems is often limited. In contrast, the applications that require the services provided by these systems can differ significantly in their security policy needs. Often application designers have to either make significant compromises in using a given group communication system or build their own customized solutions, an error-prone task. This paper presents Antigone, a framework that provides a suite of mechanisms from which flexible application security policies may be implemented. With Antigone, developers may choose a policy that best addresses their security and performance requirements of an application requiring group communication. We describe the Antigones mechanisms, consisting of a set of micro-protocols, and show how different security policies can be implemented using those mechanisms. We also present a performance study illustrating the security/performance tradeoffs that can be made using Antigone. Through an example conferencing application, we demonstrate the use of the Antigone applications programming interface and consider the use of policy in several distinct session environments.

CSE-TR-427-00 QGuard: Protecting Internet Servers from Overload Jamjoon Reumann and Shin May, 00 16
Current operating systems are not well-equipped to handle sudden load surges that arecommonly experienced by Internet servers. This means that service providers and customers may not be able to count on servers being available once their content becomes very popular. Recent Denial-of-Service attacks on major e-commerce sites have capitalized on this weakness. Remedies that were proposed to improve server behavior under overload require substantial changes to the operating system or applications, which is unacceptable to businesses that only want to use the tried and true. This paper presents QGuard, a much less radical solution to providing differential QoS, protection from overload and some DoS attacks. QGuard is an adaptive mechanism that exploits rate controls for inbound traffic in order to fend off overload and provide QoS differentiation between competing traffic classes. Our Linux-2.2.14-based QGuard prototype provides freely configurable QoS differentiation (preferred customer treatment and service differentiation) and effectively counteracts SYN and ICMP-flood attacks. Since QGuard is a purely network-centric mechanism, it does not require any changes to server applications and can be implemented as a simple add-on module for any OS. Our measurements indicate no performance degradation on lightly loaded servers and only a small reduction of aggregated server throughput (less than 2%) under overload. Well-behaved "preferred customers" remain virtually unaffected by server overload.

CSE-TR-428-00 Lightweight Failure Detection in Secure Group Communication McDaniel and Prakash Jun, 00 13
The secure and efficient detection of process failures is an essential requirement of many distributed systems. In this paper, we present the design and analysis of a mechanism used for the detection of member failures in secure groups. Based on one-time passwords, our solution does not obviate the need for periodic statements from group members, but significantly reduces the cost of their generation and validation. A study comparing the costs of traditional mechanisms with our proposed approach is presented. Results of the study indicate the average case performance of the proposed scheme is 1/10th of traditional failure detection in trusted groups, and negligible in the untrusted groups. A discussion of security and performance tradeoffs made through mechanism policy is provided.

CSE-TR-429-00 Banana Tree Protocol an End-host Multicast Protocol Helder and Jamin Jul, 00 10
Multicast is a network technology that allows a host to efficiently send data to a group of hosts. IP multicast is one example of multicast that relies on Internet routers to forward multicast packets to multicast group members. End-host multicast is another approach where the hosts in the multicast group form a virtual network and route multicast data through the graph themselves without router cooperation. This paper describes Banana Tree Protocol (BTP), an end-host multicast protocol we designed and implemented. We have simulated BTP along with other multicast protocols and theoretically optimal virtual networks. We find BTP performs well in optimal conditions, but performs poorly in more realistic conditions. We analyze this behavior and find BTP to be too limited in its allowed graph transformations. We conclude that an end-host multicast protocol must allow nodes to make a wide range of graph transformations in order to effectively optimize the graph.

CSE-TR-430-00 Library Design for Symmetric Decomposition Kravets and Sakallah Jul, 00 25
Using a very general symbolic decomposition template for logic synthesis we show how to pre-compute libraries for the decomposition patterns implied by the function structure. When coupled with decomposition the outfitted libraries are intended to produce improved synthesis quality. We illustrate the pre-computation process for functions that are symmetric in some inputs. For these functions we derive a set of fan-in-bounded cell libraries that guarantee a reduction in the width of the circuit being synthesized with each successive decomposition step.

CSE-TR-431-00 Fugue: Time Scales of Adaptation in Mobile Video Corner Noble Wasserman Aug, 00 14
Providing interactive video on hand-held, mobile devices is extremely difficult. These devices are subject to processor, memory, and power constraints, and communicate over wireless links of rapidly varying quality. Furthermore, the size of encoded video is difficult to predict, complicating the encoding task. We present Fugue, a system that copes with these challenges through a division along time scales of adaptation. Fugue is structured as three separate controllers: transmission, video, and preference. This decomposition provides adaptation along different time scales: per-packet, per-frame, and per-video. The controllers are provided at modest time and space costs compared to the cost of video encoding. We present simulations confirming the efficacy of our transmission controller, and compare our video controller to several alternatives. We find that, in situations amenable to adaptive compression, our scheme provides video quality equal to or better than the alternatives at a comparable or substantially lower computational cost. We also find that distortion, the metric commonly used to compare mobile video, under-values the contribution smooth motion makes to perceived video quality.

CSE-TR-432-00 SANE: Stable Agile Network Estimation Kim and Noble Aug, 00 11
Distributed systems are becoming increasingly dependent on network estimation, the ability to determine performance along one or more network paths. Producing quality estimates is challenging because network observations are noisy; this is particularly true in wide-area or mobile settings. Current systems depend on simple exponentially weighted moving average filters. These filters are either able to detect true changes quickly or to mask transient changes, but cannot do both. In this paper, we present four filters designed to react quickly to persistent changes while tolerating transient ones. These filters are evaluated in a variety of networking scenarios through three metrics: agility, stability and accuracy. While no single filter dominates, one based on techniques from statistical process control shows promise relative to the others.

CSE-TR-433-00 Inet: Internet Topology Generator Jin Chen and Jamin Sep, 00 21
Network research often involves the evaluation of new application designs, system architectures, and protocol implementations. Due to the immense scale of the Internet, deploying an Internet-wide system for the purpose of experimental study is nearly impossible. Instead, researchers evaluate their designs using generated random network topologies. In this report, we present a topology generator that is based on Autonomous System (AS) connectivity in the Internet. We compare the networks generated by our generator with other types of random networks and show that it generates topologies that best approximate the actual Internet AS topology.

CSE-TR-434-00 The Need for Large Register Files in Integer Codes Postiff Greene and Mudge Oct, 00 28
Register allocation is an important optimization for high performance microprocessors but there is no consensus in the architecture or compiler communities as to the best num-ber of registers to provide in an instruction set architecture. This paper discusses reasons why this situation has occurred and shows from a compiler perspective that, compared to the conventional 32-register file, 64 or more registers enables performance improvements from 5% to 20%. This is demonstrated with existing advanced compiler optimizations on the SPECint95 and SPEC2000 benchmarks. This work also documents that the optimiza-tions eliminate cache hit operations, converting common-case cache hits to faster register accesses. Finally, this work provides additional measurements for the proper number of registers in a high-performance instruction set and shows that most programs can easily use 100 to 200 registers when multiple active functions are considered for simultaneous allocation to the register file.

CSE-TR-435-00 ICP Registration using Invariant Features Sharp Lee and Wehe Oct, 00 25
This paper investigates the use of Euclidean invariant features in a generalization of iterative closest point registration of range images. Pointwise correspondences are chosen as the closest point with respect to a weighted linear combination of positional and feature distances. It is shown that under ideal noise-free conditions, correspondences formed using this distance function are correct more often than correspondences formed using the positional distance alone. In addition, monotonic convergence to at least a local minimum is shown to hold for this method. When noise is present, a method that automatically sets the optimal relative contribution of features and positions is described. This method trades off error in feature values due to noise against error in positions due to misalignment. Experimental results show that using invariant features decreases the probability of being trapped in a local minimum, and is most effective for difficult registration problems where the scene is very small compared to the model.

CSE-TR-436-00 SimSect Hybrid Dynamical Simulation Environment Saranli Oct, 00 53
This report is the users manual of SimSect, a flexible environment for the simulation of hybrid dynamical systems. Hybrid dynamical systems are systems where the continuous dynamics undergo discrete changes upon detection of predefined state based events. The simulation of such systems involve accurate detection of events and handling of the transition between different continuous systems. This report also includes the details of two example hybrid dynamical systems: A spring loaded inverted pendulum (SLIP) runner and a compliant hexapod. The former example illustrates the basic elements of programming models with SimSect, while the latter implements a much more complicated model, addressing many common issues arising when defining complex dynamical models with SimSect.

CSE-TR-437-00 Algebra Based Optimization Strategies for Decentralized Mining V. Jenson and N. Soparkar Oct, 00 24
In previous work, we demonstrated the importance of mining of decentralized data (i.e., normalized tables, possibly residing in several repositories, with several schemas, separate administration etc) in contrast to current technology which applies to centrally stored data. Our decentralized approach to mining data computes partial results at the separate tables, and merges them to obtain the final one -- thereby avoiding the expensive materialization of joins. We provided approaches to mining multiple tables located in a data warehouse, and indicated how these may be further extended. Our decentralized algorithms for finding frequent itemsets (used in association rules and classification algorithms) across multiple tables were analyzed for their performance characteristics, and these were validated empirically as well. In this paper, we detail our approach as extended to general database schema designs, including physically distributed tables, and we describe our techniques which are similar to standard query optimization and processing. We present an algebra (more as an explanatory tool than for rigor) over expressions that denote the mined information (frequent itemsets, in our case) to describe how our basic decentralized mining approaches apply to generic database schema designs. To complement our algebra, we provide useful equivalences among the expressions, and this allows expression re-writing to represent different alternatives possible during the mining. Each equivalent expression indicates several processing strategies for decentralized and/or distributed designs. Using cost formulae available from our previous work, we present the choice among the processing strategies as an optimization problem as regards the efficiency in execution. We discuss the parameters needed for estimating the cost of different executions and finally, we describe heuristics to reduce the search within the space of possible execution plans.

CSE-TR-438-00 Ismene: Provisioning and Policy Reconciliation in Secure Group Communication McDaniel and Prakash Dec, 00 21
Group communication systems increasingly provide security services. However, in practice, the use of such systems is complicated by the divergent requirements and abilities of group members. In this paper, we define a policy language called Ismene that directs the provisioning of security-related resources at member sites. The communication service is defined through a reconciliation of a group policy and members local policies into a security configuration. Group authorization and access control appropriate for the operating conditions and session configuration are also defined within the policy. The use of Ismene policies to define security is illustrated through an extended example of a group application built on the prototype Ismene framework.

Technical Reports Page