CSE Technical Reports Sorted by Technical Report Number
|The use of distributed file systems (DFS) has been popularized by systems such as AFS and NFS. They allow
users to share system resources from any location in the
network. They also increase the reliability and availability
of these resources in that underlying changes are transparent
to the user. Concurrently, much research has been undergoing
in the area of distributed-shared memory (DSM). The concept
of DSM is similar to that of DFS, in that it can allow many
users to share a common base of resources. In DSM, however,
the resource is not disk storage, but rather memory.
Like DFS, DSM allows all users to have the same view of the
resource that is transparent when moving between machines.
Because these two systems are so similar, we combine the
power of a DFS with the ease of development available from
a DSM system. We quantify the performance lost by using
the DSM layer and will also demonstrate the gains in ease
|In this report we take a fresh look at the notion of symmetries in Boolean functions. In particular, we show that the classical characterization of
symmetries based on invariance under variable swaps is a special case of
a more general invariance based on unrestricted variable permutations. We
propose a generalization of classical symmetry that allows for the
simultaneous swap of groups of variables and show that it captures more
of a functionís invariant permutations without undue computational
requirements. We apply the new symmetry definition to analyze a large set
of benchmark circuits and provide extensive data showing the existence of
substantial symmetries in those circuits. Specific case studies of several
of these benchmarks reveal additional insights about their functional
structure and how it might be related to their circuit structure.
|We introduce a new experimental C compiler in this report. The compiler, called MIRV, is designed to enable research that explores the interaction between the compiler and
microarchitecture. This introductory paper makes comparisons between MIRV and GCC.
We notice trends between the compilers and optimization levels across SPECint1995 and
SPEC2000. Finally, we provide a set of SimpleScalar/PISA binaries to the research com-munity.
As we improve the compiler, we encourage architecture researchers to use these
optimized binaries as reference programs for architecture research.
|In this report we describe a method for building an attention function on the two dimensional torus. Such a function is used as an analytic switch
by an underactuated robotic system to change its attention from one controlled cyclic process to
another. A point in the torus represents two phase variables which each describe the progress of a
cyclic system around a circle, such as that of a bouncing ball or a hopping leg. The attention
function is designed so that under a certain flow on the torus, attention is paid to the phase which
will next become zero.
|The Unified Modeling language is becoming more and more popular in the software development. However because of its ambiguisity in its semantic model,
few verification tool has been built. Abstract State Machines have been
successfully applied in giving semantics for programming language like C.
In this report, we try to use the Abstract State Macines to give a semantics model
for UML and then use ASM Model Checker to design a verification tool for UML.
Last we give a toy example to show how the verification works.
|The difference in processor and main memory cycle time has necessitated the use of aggressive prefetching techniques to
reduce or hide main memory access latency. However, prefetching
can significantly increase memory bandwidth and unsuccessful
prefetches may even pollute the primary cache. Although the
current metrics, coverage and accuracy, do provide
an impression of the overall quality of a prefetching algorithm,
they are simplistic and do not unambiguously and
precisely explain the performance of a prefetching algorithm. In
this paper, we introduce a new and complete taxonomy for
classifying prefetches, accounting for the difference in traffic
and misses generated by each prefetch. Our classification of
individual prefetches is very useful in identifying the specific
strengths and weaknesses of an existing prefetch implementation
and we demonstrate it by applying our taxonomy to two
implementable prefetching techniques. Based on the histogram of
prefetches by category we developed a static filter to
reduce/eliminate polluting prefetches of any given prefetching
|Memory access latencies are much larger than processor cycle times, and this gap has been increasing over time. Cache performance
is critical to bridging this gap. Since caches cannot be both large
and fast, cache design involves hierarchies with trade-offs between access times
and miss ratios. This paper introduces a novel primary cache design,
called the Split Latency Cache System (SpliCS). SpliCS employs two data stores:
a small, fast cache (A) and a larger, but slower cache (B), inclusion is
maintained and both caches are accessed in parallel.
SpliCS improves the effectiveness of the cache hierarchy by allowing
very fast access to some lines while still allowing a large primary cache
to be used. Our results using an out-of-order processor model show that relative
to a similarly configured
traditional cache, SpliCS achieves a 12 to 13% improvement in CPI on
commercial applications and 14 to 18% improvement in CPI on some benchmarks
from the SPEC suite.
|Significant strides have been made in achieving strong semantics and security guarantees within group communication and multicast systems.
However, the scope of available security policies in these systems is
often limited. In contrast, the applications that require the
services provided by these systems can differ significantly in their
security policy needs. Often application designers have to either
make significant compromises in using a given group communication
system or build their own customized solutions, an error-prone task.
This paper presents Antigone, a framework that provides a suite of
mechanisms from which flexible application security policies may be
implemented. With Antigone, developers may choose a policy that best
addresses their security and performance requirements of an
application requiring group communication. We describe the Antigones
mechanisms, consisting of a set of micro-protocols, and show how
different security policies can be implemented using those mechanisms.
We also present a performance study illustrating the
security/performance tradeoffs that can be made using Antigone.
Through an example conferencing application, we demonstrate the use of
the Antigone applications programming interface and consider the use
of policy in several distinct session environments.
|Current operating systems are not well-equipped to handle sudden load surges that arecommonly experienced by Internet servers. This means that service providers
may not be able to count on servers being available once their content
becomes very popular. Recent Denial-of-Service attacks on major e-commerce
sites have capitalized on this weakness.
Remedies that were proposed to improve server behavior under overload require
substantial changes to the operating system or applications, which is
unacceptable to businesses that only want to use the tried and true. This
paper presents QGuard, a much less radical solution to providing
differential QoS, protection from overload and some DoS attacks.
QGuard is an adaptive mechanism that exploits rate controls for inbound
traffic in order to fend off overload and provide QoS differentiation between
competing traffic classes.
Our Linux-2.2.14-based QGuard prototype provides
freely configurable QoS differentiation (preferred customer treatment
and service differentiation)
and effectively counteracts SYN and ICMP-flood attacks.
Since QGuard is a purely network-centric mechanism, it does not require any
changes to server applications and can be implemented as a simple add-on
module for any OS. Our measurements indicate no
performance degradation on lightly loaded servers and only a small reduction
of aggregated server throughput (less than 2%) under overload.
Well-behaved "preferred customers" remain virtually unaffected by server
|The secure and efficient detection of process failures is an essential requirement of many distributed systems. In this paper, we present
the design and analysis of a mechanism used for the detection of
member failures in secure groups. Based on one-time passwords, our
solution does not obviate the need for periodic statements from group
members, but significantly reduces the cost of their generation and
validation. A study comparing the costs of traditional mechanisms
with our proposed approach is presented. Results of the study
indicate the average case performance of the proposed scheme is 1/10th
of traditional failure detection in trusted groups, and negligible in
the untrusted groups. A discussion of security and performance
tradeoffs made through mechanism policy is provided.
| Multicast is a network technology that allows a host to efficiently send data to a group of hosts. IP multicast is one example of
multicast that relies on Internet routers to forward multicast
packets to multicast group members. End-host multicast is
another approach where the hosts in the multicast group form a
virtual network and route multicast data through the graph
themselves without router cooperation.
This paper describes Banana Tree Protocol (BTP), an end-host
multicast protocol we designed and implemented. We have simulated
BTP along with other multicast protocols and theoretically optimal
virtual networks. We find BTP performs well in optimal conditions,
but performs poorly in more realistic conditions. We analyze this
behavior and find BTP to be too limited in its allowed graph
transformations. We conclude that an end-host multicast protocol
must allow nodes to make a wide range of graph transformations in
order to effectively optimize the graph.
|Using a very general symbolic decomposition template for logic synthesis we show how to pre-compute libraries for the decomposition
patterns implied by the function structure. When coupled with
decomposition the outfitted libraries are intended to produce improved
synthesis quality. We illustrate the pre-computation process for
functions that are symmetric in some inputs. For these functions we
derive a set of fan-in-bounded cell libraries that guarantee a
reduction in the width of the circuit being synthesized with each
successive decomposition step.
|Providing interactive video on hand-held, mobile devices is extremely difficult. These devices are subject to processor, memory, and power
constraints, and communicate over wireless links of rapidly varying
quality. Furthermore, the size of encoded video is difficult to
predict, complicating the encoding task. We present Fugue, a system
that copes with these challenges through a division along time scales
of adaptation. Fugue is structured as three separate controllers:
transmission, video, and preference. This decomposition provides
adaptation along different time scales: per-packet, per-frame, and
per-video. The controllers are provided at modest time and space
costs compared to the cost of video encoding.
We present simulations confirming the efficacy of our transmission
controller, and compare our video controller to several alternatives.
We find that, in situations amenable to adaptive compression, our
scheme provides video quality equal to or better than the alternatives
at a comparable or substantially lower computational cost. We also
find that distortion, the metric commonly used to compare mobile
video, under-values the contribution smooth motion makes to perceived
|Distributed systems are becoming increasingly dependent on network estimation, the ability to determine performance along one or more
network paths. Producing quality estimates is challenging because
network observations are noisy; this is particularly true in wide-area
or mobile settings. Current systems depend on simple exponentially
weighted moving average filters. These filters are either able to
detect true changes quickly or to mask transient changes, but cannot
do both. In this paper, we present four filters designed to react
quickly to persistent changes while tolerating transient ones. These
filters are evaluated in a variety of networking scenarios through
three metrics: agility, stability and accuracy. While no single
filter dominates, one based on techniques from statistical process
control shows promise relative to the others.
|Network research often involves the evaluation of new application designs, system architectures, and protocol implementations. Due to the immense
scale of the Internet, deploying an Internet-wide system for the purpose of
experimental study is nearly impossible. Instead, researchers evaluate
their designs using generated random network topologies. In this report,
we present a topology generator that is based on Autonomous System (AS)
connectivity in the Internet. We compare the networks generated by our
generator with other types of random networks and show that it
generates topologies that best approximate the actual Internet AS topology.
|Register allocation is an important optimization for high performance microprocessors but there is no consensus in the architecture or compiler communities as to the best num-ber
of registers to provide in an instruction set architecture. This paper discusses reasons
why this situation has occurred and shows from a compiler perspective that, compared to
the conventional 32-register file, 64 or more registers enables performance improvements
from 5% to 20%. This is demonstrated with existing advanced compiler optimizations on
the SPECint95 and SPEC2000 benchmarks. This work also documents that the optimiza-tions
eliminate cache hit operations, converting common-case cache hits to faster register
accesses. Finally, this work provides additional measurements for the proper number of
registers in a high-performance instruction set and shows that most programs can easily
use 100 to 200 registers when multiple active functions are considered for simultaneous
allocation to the register file.
|This paper investigates the use of Euclidean invariant features in a generalization of iterative closest point registration
of range images.
Pointwise correspondences are chosen as the closest point
with respect to a weighted linear combination of
positional and feature distances.
It is shown that under ideal noise-free conditions,
correspondences formed using
this distance function are correct
more often than correspondences formed using the
positional distance alone.
In addition, monotonic convergence to at least
a local minimum is shown to hold for this method.
When noise is present, a method that automatically sets the
optimal relative contribution of features and positions is described.
This method trades off error in feature values due to noise against
error in positions due to misalignment.
Experimental results show that using invariant features
decreases the probability of being trapped in a local minimum,
and is most effective for difficult registration
problems where the scene is very small compared to the model.
|This report is the users manual of SimSect, a flexible environment for the simulation of hybrid dynamical systems. Hybrid dynamical systems are
systems where the continuous dynamics undergo discrete changes upon
detection of predefined state based events. The simulation of such systems
involve accurate detection of events and handling of the transition between
different continuous systems.
This report also includes the details of two example hybrid dynamical
systems: A spring loaded inverted pendulum (SLIP) runner and a compliant
hexapod. The former example illustrates the basic elements of programming
models with SimSect, while the latter implements a much more complicated
model, addressing many common issues arising when defining complex dynamical
models with SimSect.
|In previous work, we demonstrated the importance of mining of decentralized data (i.e., normalized tables,
possibly residing in several repositories, with several schemas,
separate administration etc)
in contrast to current technology which
applies to centrally stored data.
Our decentralized approach to mining data
computes partial results at the separate
tables, and merges them to obtain the final
one -- thereby avoiding the expensive
materialization of joins. We provided
approaches to mining multiple tables located
in a data warehouse, and indicated how these
may be further extended. Our decentralized
algorithms for finding frequent itemsets
(used in association rules and classification
algorithms) across multiple tables were analyzed
for their performance characteristics, and these
were validated empirically as well.
In this paper, we detail our approach as
extended to general database schema designs,
including physically distributed tables, and
we describe our techniques which are
similar to standard query
optimization and processing. We present an
(more as an explanatory tool than for rigor)
over expressions that denote the
mined information (frequent itemsets, in our case)
to describe how our basic
decentralized mining approaches apply to generic
database schema designs. To complement our
algebra, we provide useful equivalences
among the expressions,
and this allows expression re-writing to
represent different alternatives possible during the
Each equivalent expression
indicates several processing strategies for
decentralized and/or distributed designs.
Using cost formulae available from our
previous work, we present the choice among the
processing strategies as an optimization
problem as regards the efficiency in
execution. We discuss the parameters
needed for estimating the cost of
different executions and finally,
we describe heuristics to
reduce the search within the space of
possible execution plans.
|Group communication systems increasingly provide security services. However, in practice, the use of such systems is complicated by the
divergent requirements and abilities of group members. In this paper,
we define a policy language called Ismene that directs the
provisioning of security-related resources at member sites. The
communication service is defined through a reconciliation of a group
policy and members local policies into a security configuration.
Group authorization and access control appropriate for the operating
conditions and session configuration are also defined within the
policy. The use of Ismene policies to define security is illustrated
through an extended example of a group application built on the
prototype Ismene framework.
Technical Reports Page