CSE Technical Reports Sorted by Technical Report Number
| TR Number |
Title |
Authors |
Date |
Pages |
| Real-world and simulated real-world domains, such as flying anddriving, commonly have the characteristics of continuous-valued (CV)
environments. These environments are frequently complex and difficult
to control, requiring a great deal of specific, detailed
knowledge. Symbolic computation can use many different types of
knowledge to constrain the control function. This thesis investigates
learning the control function using observed data in a symbolic
architecture. SPLICE (Symbolic Performance & Learning In
Continuous-Valued Environments) is a Soar agent that performs symbolic
control function approximation. SPLICE uses a three-level framework to
first classify its sensory information into symbolic regions, then map
the set of regions to a local model, then use the local model to
determine an action. Learning causes the models to gradually become
more specific and more accurate. SPLICE agent is evaluated against
other adaptive control algorithms in simple domains using four
experimental methodologies. We also instantiate SPLICE in a complex
environment, simulated flight, and evaluate its performance. The final
goal is not only to create an effective controller for complex
continuous environments, but also to understand more clearly the
ramifications of symbolic learning in continuous domains.
|
| In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by
storing logically contiguous instructions in physically contiguous storage. When a particular
basic block is requested,
the trace cache can potentially respond with the requested block along with several blocks
that followed it when the block was last encountered.
In this technical report, we examine some critical features of a trace cache mechanism
designed for a 16-wide issue processor and evaluate their effects on performance. We examine
features such as cache associativity, storage partitioning, branch predictor design, instruction cache
design, and fill unit design. We compare the performance of our trace cache
mechanism with that of the design presented by Rotenberg et al and
show a 23% improvement in performance. In our final analysis, we compare our trace cache mechanism with
an aggressive single basic block fetch mechanism and show that the trace cache mechanism attains a 24%
improvement in performance.
|
| This is only a draft of a portion of the ASM guide, but there are no plans to change this part in any substantial way, only to augment it.
|
| Pipeline stalls due to branches represent one of the most significant impediments to realizing the performance potential of deeply pipelined
superscalar processors. Two well-known mechanisms have been proposed
to reduce the branch penalty, speculative execution in conjunction with
branch prediction and predicated execution.
This dissertation proposes branch classification, coupled with improvements
in conditional branch prediction, indirect branch prediction, and predicted
execution, to reduce the branch execution penalty.
Branch classification allows an individual branch instruction to be
associated with the branch predictor best suited to predict its direction.
Using this approach, a hybrid branch predictor is constructed which achieves
a higher prediction accuracy than any branch predictor previously reported
in the literature.
This dissertation also proposes a new prediction mechanism for predicting
indirect jump targets. For the perl and gcc benchmarks, this mechanism
reduces the indirect jump misprediction rate by 93.4% and 63.3% and the
overall execution time on an 8-wide issue out-of-order execution machine
by 14% and 5%.
Finally, this dissertation proposes a new method for combining the
performance benefits of predicated execution and speculative execution.
This approach significantly reduces the branch execution penalty suffered
by wide-issue processors.
|
| Turing machines define polynomial time (PTime) on strings but cannot deal with structures like graphs directly, and there is no known, easily
computable string encoding of isomorphism classes of structures. Is there
a computation model whose machines do not distinguish between isomorphic
structures and compute exactly PTime properties? This question can be
recast as follows: Does there exists a logic that captures polynomial time
(without presuming the presence of a linear order)? Earlier, one of us
conjectured the negative answer.The problem motivated a quest for stronger
and stronger PTime logics. All these logics avoid arbitrary choice. Here
we attempt to capture the choiceless fragment of PTime. Our computation
model is a version of abstract state machines (formerly called evolving
algebras). The idea is to replace arbitrary choice with parallel execution.
The resulting logic is more expressive than other PTime logics in the
literature. A more difficult theorem shows that the logic does not capture
all PTime.
|
| The distribution of object colors can be effectively utilized for recognition and indexing. Difficulties arise in the recognition of
object color distributions when there are variations in illumination
color, changes in object pose with respect to illumination direction, and
specular reflections. However, most of the recent approaches to color-based
recognition focus mainly on illumination color invariance. We propose
an approach that identifies object color distributions influenced by:
(1) illumination pose, (2) illumination color and
(3) specularity. We suggest the use of chromaticity distributions to
achieve illumination pose invariance. To characterize changes in
chromaticity distribution due to illumination color, an eigenspace
analysis is employed. A set of chromaticity histograms of each object
is generated for a range of lighting colors based on linear models
of illumination and reflectance, and the histograms are represented using
a small number of eigen basis vectors constructed from principal
components analysis. Represented in an eigenspace, chromaticity
signatures for each object in the model database form a manifold
encompassing a range of lighting colors. Recognition and illumination
color estimation are performed by projecting the chromaticity
signature of a test object into the eigenspace and examining its
proximity to database manifolds. Since specular reflections may
displace a projection away from its corresponding manifold, a
model-based specularity detection/rejection algorithm, called
chromaticity differencing, is developed to reduce this discrepancy.
|
| Powerful interoperability-enabling solutions for software application integration must allow applications to evolve and data requirements to
change, while minimizing such changes on other integrated
applications. The transparent schema evolution system (TSE), that we
developed at the University of Michigan, accomplishes evolution
through schema versioning, where a schema version is a dynamic
object-oriented view. When the TSE system receives a view schema
change request, the system computes a new view schema that reflects
the desired change instead of modifying the old view schema in place,
therefore preserving existing view schemas for old applications. This
generation of a potentially large number of schema versions over time
results in an excessive build-up of classes and underlying object
instances, not all being necessarily still in use. Since our view
system provides us with materialized views, a degradation of system
performance due to update progagation is expected in a situation where
we have no-longer utilized view schemas in the system. Also, a larger
number of view schemas will result in storage overhead costs. In this
paper, we address this problem using consistent schema removal
techniques. First, we characterize four potential problems of schema
consistency that could be caused by removal of a single virtual class;
and then outline our solution approach for each of these problems.
Second, we demonstrate that view schema removal is sensitive to the
order in which individual classes are processed. Our solution to this
problem is based on a formal model, called the dependency model, of
capturing all dependencies between classes as logic clauses and of
manipulating them to make decisions on class deletions and
nondeletions while guaranteeing the consistency of the schema. Third,
based on this formal model, we have developed and proven consistent a
dependency graph (DG) representation and an associated set of rules
for DG generation, reduction, and transformation. A first preliminary
version of the latter has been successfully implemented in our Schema
Version Removal (SVR) tool. Fourth, we present a cost model for
evaluating alternative removal patterns on DG. Lastly, we report our
preliminary experimental studies that demonstrate the impact of our
schema version removal on the performance of the TSE system.
|
| There have been very few performance studies of hardware-managed translation look-aside buffers (TLBs). The major reason is the lack of
efficient and accurate analysis tools. Newer operating systems,
applications, and the popularity of the client-server model of computation
place a greater burden than their predecessors on memory system components
such as TLBs. Thus it is becoming more important to measure the
performance of memory systems under such workloads. In this work, we
implemented a trap-driven simulator in an operating system to emulate a
variety of TLBs. Using this tool, we were able to evaluate the performance
of a range of TLBs under these newer workloads. The results show that in
order to improve the TLB performance, we should carefully map pages into
the TLB, append process identifiers to avoid flushing the TLB contents
frequently, or reserve part of the TLB for a particular server process.
|
| We propose a method for compressing programs in embedded processors where instruction memory size dominates cost. A post-compilation analyzer examines a program and replaces com
mon sequences of instructions with a single instruction codeword. A microprocessor executes the
compressed instruction sequences by fetching codewords from the instruction memory, expanding
them back to the original sequence of instructions in the decode stage, and issuing them to the
execution stages. We apply our technique to the PowerPC instruction set and achieve 30% to 50%
reduction in size for SPEC CINT95 programs.
|
| Most of the recent color recognition/indexing approaches concentrate on establishing invariance to illumination color to improve the
utility of color recognition. However, other effects caused by
illumination pose and specularity on three-dimensional object surfaces
have not received notable attention. We present a chromaticity
recognition method that discounts the effects of illumination pose,
illumination color and specularity. It utilizes a chromaticity space
based on log-ratio of sensor responses for illumination pose and color
invariance. A model-based specularity detection/rejection
algorithm can be used to improve the chromaticity recognition and illumination
estimation for objects including specular reflections.
|
| In this paper, we explore the constraints of a new problem: that of coordinating network troubleshooting among peer administrative domains
or Internet Service Providers, and untrusted observers. Allowing
untrusted observers permits any entity to report problems, whether it
is a Network Operations Center (NOC), end-user, or application.
Our goals here are to define the inter-domain coordination problem
clearly, and to develop an architecture which allows observers to
report problems and receive timely feedback, regardless of their own
locations and identities. By automating this process, we also relieve
human bottlenecks at help desks and NOCs whenever possible.
We begin by presenting a troubleshooting methodology for coordinating
problem diagnosis. We then describe GDT, a distributed protocol which
realizes this methodology. We show through simulation that GDT performs
well as the number of observers and problems grows, and continues to
function robustly amidst heavy packet loss.
|
| This dissertation introduces the notion of interactive delayed-sharing of a session on a computer-supported workspace to allow reuse of such session at a later time. A data artifact, referred to as a session object, encapsulates the session. A session object is composed of heterogeneous multimedia streams that represent temporally-ordered input sequences to applets in the workspace. Playback of a session object recreates the underlying workspace spanned by these applets through the streaming and re-execution of these input sequences in their respective applets. The contributions of this dissertation are as follows.
First, this dissertation pioneers the re-execution approach to the record and replay of sessions for supporting computer-supported collaborative work. In doing so, it introduces a novel paradigm for flexible support of asynchronous collaboration that allow users to collaborate by annotating, editing, and refining these delayed-shared workspaces. This dissertation explores these collaborative features particularly focusing on the record, representation, and playback of these workspaces.
Second, this dissertation introduces the notion of workspaces as authored, transportable, and temporally-aware artifacts and investigates flexible mechanisms for the delivery of temporal-awareness to workspaces composed of heterogeneous applications through the decoupling of tool and media services. This dissertation develops a formal specification through temporal relationships in these workspaces.
Finally, this dissertation describes mechanisms for the integration of re-executable content streams together with continuous multimedia streams. It describes a framework for media-independent integration of these heterogeneous streams and their temporal constraints. The dissertation introduces the use of statistical process controls to guide the relaxation and/or constraint of scheduling and synchronization mechanisms so as to flexibly support a range of heterogeneous media streams and their temporal relationships.
|
| This work presents a hybrid branch predictor scheme that uses a limited form of dual path execution along with dynamic branch prediction
to improve execution time across a broad set of applications.
The ability to execute down both paths of a conditional branch enables
branch penalty to be minimized; however, relying exclusively on dual
path execution is infeasible due to the a cascading effect because
instruction fetch rates far exceed the capability of the pipeline to
retire a single branch before others must be processed. By using
confidence information, available in the dynamic branch prediction state
tables, a limited form of dual path execution becomes feasible.
The reduces the burden on the branch predictor by allowing predictions
of low confidence to be avoided.
In this study we present a new approach to gather branch prediction
confidence with little or no overhead, and use this confidence mechanism
to determine whether dual path execution or branch prediction should be
used.Comparing this hybrid predictor model to the dynamic branch predictor
shows a dramatic decrease in misprediction rate (from 30 to 60 %).
This translates to an reduction in runtime of 10 - 15%.
These results imply that dual path execution, which often is thought to be
a resource consuming method, may be a worthy approach if restricted with a
predicting set.
|
| This paper examines the use of adaptable priority marking for providing soft bandwidth guarantees to individual connections or connection
groups over the Internet. The proposed scheme does not require resource
reservation for individual connections and can be supported with minimal
changes to the network infrastructure. The scheme uses modest support
from the network in the form of priority handling for appropriately
marked packets and relies on intelligent transmission control mechanisms
at the edges of the network to achieve the desired throughput levels.
The paper describes the control mechanisms and evaluates their
behavior in various network environments. The mechanisms described have
the flexibility to adapt to different network environments, thereby
making them suitable for deployment in an evolving Internet.
|
| As the gap between processor and memory speeds increases, cache performa nce becomes more critical to overall system performance. Behavioral cache simul
ation is typically used early in the design cycle of new processor/cache configu
rations to determine the performance of proposed cache configurations on target
workloads. However, behavioral cache simulation does not account for the latenc
y seen by each memory access. The Latency-Effects (LE) cache model presented in
this paper accounts this nominal latency as well as the additional latencies du
e to trailing-edge effects, bus width considerations, port conflicts, and the nu
mber of outstanding accesses that a cache allows before it blocks. We also exte
nd the LE cache model to handle the latencies seen for moving data among multipl
e caches. mlcache, a new, easily configurable and extensible tool, has b
een built based on the extended LE model. We show the use of mlcache in
estimating the performance of traditional and novel cache configurations, includ
ing odd/even, 2-level, Assist, Victim, and NTS caches. We also show how the LE
cache timing model provides more useful, realistic performance estimates than ot
her possible behavioral-level cache timing models.
|
| The congestion control mechanisms used in TCP have been the focus of numerous studies and have undergone a number of enhancements. However,
even with these enhancements, TCP connections still experience alarmingly
high loss rates, especially during times of congestion. To alleviate this
problem, the IETF is considering active queue management mechanisms, such
as RED, for deployment in the network. In this paper, we first show that
even with RED, loss rates can only be reduced marginally in times of
congestion. We then show that the solution to this problem lies
in decoupling congestion notification from packet loss through the
use of explicit congestion notification (ECN) and in understanding
the traffic generated by an aggregation of a large number of
sources. Given this, we then propose and experiment with more
adaptive active queue management algorithms (Adaptive RED) and more
conservative end-host mechanisms (SubTCP) which can significantly reduce
loss rates across congested links. When used together, these mechanisms
can effectively eliminate packet loss while maintaining high link
utilizations under the most difficult scenarios. |
| Quality-of-service (QoS) routing can satisfy application performance requirements and optimize network resource usage by selecting paths
based on connection traffic parameters and link load information.
However, effective path-selection schemes require the distribution of
link-state information, which can cause a significant burden on the
bandwidth and processing resources in the network. We investigate the
fundamental tension between network overheads and the quality of
routing decisions in the context of source-directed QoS routing
algorithms. In contrast to previous performance studies that compare
different routing algorithms under specific network configurations, we
characterize how the performance and overheads of QoS routing relate
to the link-state update policies, as a function of the underlying
traffic load, network topology, and link-cost metrics. We explore the
interplay between stale link-state information and random fluctuations
in traffic load through a broad set of simulation experiments on a
parameterized model of QoS routing. These results suggest ways to
tune the frequency of link-state update messages and the link-cost
functions to strike a careful balance between high accuracy and low
complexity. |
| Collaborative, multi-user applications require group multicast services with ordering guarantees for maintaining consistency of replicated shared state among collaborating processes. In traditional group multicast services (such as Isis), the groups shared state is maintained only by the clients and the multicast service facilitates
atomic state transfer from existing clients to new members. In this
paper, we argue that, in order to support collaborative applications in
Internet-type environments, a copy of the groups state should also be
maintained by the multicast service. We show that by maintaining a
copy of groups state, the multicast service can provide consistently
fast group-join and state transfer times when both slow and fast
clients are present or when clients are unreliable --- a crucial
requirement in collaborative, multi-user applications where users may
dynamically join and leave a collaborative session and expect
predictable join times and interactive response time even in the
presence of slow or unreliable clients. We show that the overheads
incurred by a multicast service in managing each groups shared state
can be made minimal and that the multicast service does not have to be
aware of the semantics of the groups state. We present the design of
such a multicast service, present performance results, and discuss how
it meets the various needs of collaborative applications. |
| The proliferation of computer networks in the last decade and the ubiquity of the World Wide Web have led to increased interest in the
development of computer-supported cooperative work (CSCW)
systems. Collaborative, multi-user applications require group
multicast services that provide ordering guarantees for maintaining
consistency of replicated shared context as well as provide a high degree of
interactivity, even under varying load on the multicast
servers. While the most common view of the quality of service (QoS) in
a distributed system is in terms of the guarantee of the network
connection parameters (bandwidth, end-to-end delay), in this paper we
investigate QoS from the perspective of the various requirements
placed on group communication servers with limited resources by
multiple and diverse groups of collaborative users. We show that in
the absence of QoS considerations in the design of a group
communication service, some groups or individual users can be severely
affected by bursty traffic or increase in the size of other groups. We
present the design of a best-effort QoS-based adaptive group
communication service for supporting reliable data communication in
CSCW systems. Our QoS considerations address both groups requirements and
individual users requirements. We present performance results showing
the effectiveness of the approach and discuss some of the open issues
for future work. |
| We propose an efficient flow-control scheme for ATM
ABR multicast services.
We develop a second-order rate-control
algorithm to deal with the variation in RM (Resource Management) cell
feedback delay resulting from
dynamic ``drift' of the bottleneck location within a multicast tree.
The proposed scheme
makes the rate process converge to the
available bandwidth of the multicast-connections most congested link.
It also confines the buffer occupancy to a target regime
bounded by a given (finite) buffer capacity at the bottleneck node.
Using fluid approximation, we model the proposed scheme and
study the system dynamics
under the most stressful traffic conditions.
We derive expressions for queue buildups
and average throughputs in both transient and equilibrium states.
We identify the system control factors that govern the system dynamics
and develop an optimal control
condition which guarantees
monotonic convergence of the system state to the target regime from an arbitraryinitial value. |
| We propose a flow-control scheme that combines the merits of credit- and rate-based flow-control schemes by applying {it direct/}
control over {it both} bandwidth and buffer resources.
The goal of the proposed scheme is to design an optimal
rate-control policy for a given finite buffer capacity by %that
maximizing average throughput and bounding end-to-end delay.
By applying the second-order rate control, the proposed scheme
not only makes the rate process converge to the neighborhood
of link bandwidth, but also confines the queue-length
fluctuation to a regime bounded by buffer capacity.
Using hop-by-hop credit flow control, the proposed scheme
ensures lossless transmission with a finite buffer capacity
while maintaining high network utilization.
The source dynamically adjusts its transmission rate
and rate-control parameters according to both the network congestion
status and whether or not the system states are
in the target operating regime.
Using the fluid approximation method, we model the proposed
flow-control scheme and study
the system dynamic behavior for ABR (Available Bit Rate) service
under the most stressful traffic condition.
We derive the expressions for queue build-ups
and average throughput in the equilibrium states
as functions of rate-control parameters, feedback delay,
link bandwidth, and buffer capacity. Based on the analytical
results, we identify the optimal equilibrium state and propose
the second-order rate control algorithm for transient-state
flow control. We derive a sufficient
condition which guarantees the second-order rate control to
drive the system from any transient state to an optimal equilibrium
state. The worst-case performance bounds, derived as the closed-form
functions of flow-control parameters, provide an analytical
means of evaluating the performance of the second-order rate
control in terms of convergence speed and buffer utilization.
The analytical results %in both transient and equilibrium states
for both single and multiple connection
scenarios
have shown the proposed scheme to be stable and efficient
in that the source rate and
bottleneck queue length rapidly converge to a small neighborhood
of the designated operating point.
Also, presented are examples showing that the proposed scheme
outperforms the other existing schemes.
|
| A mathematical model of the Java programming language is constructed. |
Technical Reports Page
|
|