CSE Technical Reports Sorted by Technical Report Number
|This paper provides a theoretical framework upon which to build a system for chordal analysis of music. We establish that, in the worst case, the partitioning and labeling of harmonies in tonal music is O(2^N), where N is number of notes in a piece of music. We show that, when segments of the music can be analyzed locally, and a segment scoring metric can be found that is both additive and transitive, the problem becomes O(N^2). Under these constraints, the problem is cast as finding the best path through a single-source directed acyclic graph, which can be solved in O(E + V) time. This translates to a worst-case time complexity of O(N^2) for our problem. We then show that the results of the O(N^2) search can be closely approximated through the use of a heuristic that allows O(N) time search. The results of the heuristic search are then compared to exhaustive graph search and the results of analyses by existing systems by Winograd (Winograd 1968), Maxwell (Maxwell 1992), and Temperley and Sleator (Temperley and Sleator 1999).
|Although x86 processors have been around for a long time and are the most ubiquitous processors in the world, the amount of academic research regarding details of their performance has been minimal. Here, we will introduce x86 simulation environment, which we call Trace Analysis for X86 Interpretation, or TAXI, and use it to discuss the differences between current x86 processors and other processors, and present some performance results of eight Win32 applications. By utilizing TAXI, we can attribute performance bottlenecks to those components of the microarchitecture that cause the most performance degradation. We look at 8 aspects of front-end that can contribute to performance loss; then based on this information, we introduce an improvement that yields 17% speedup in overall execution time.
|MASE (Micro Architectural Simulation Environment) is a novel infrastructure that provides a flexible and capable environment to model modern microarchitectures. Many popular simulators, such as SimpleScalar, are predominately trace-based where the performance simulator is driven by a trace of instructions read from a file or generated on-the-fly by a functional simulator. Trace-driven simulators are well-suited for oracle studies and provide a clean division between performance modeling and functional emulation. A major problem with this approach, however, is that it does not accurately model timing dependent computations, an increasing trend in microarchitecture designs such as those found in multiprocessor systems. MASE implements a micro-functional performance model that combines timing and functional components into a single core. In addition, MASE incorporates a trace-driven functional component used to implement oracle studies and check the results of instructions as they commit. The check feature reduces the burden of correctness on the micro-functional core and also serves as a powerful debugging aid. MASE also implements a callback scheduling interface to support resources with non-deterministic latencies such as those found in highly concurrent memory systems. MASE was built on top of the current version of SimpleScalar. Analyses show that the performance statistics are comparable without a significant increase in simulation time.
|Minimizing power, increasing performance, and delivering effective memory bandwidth are todays primary microprocessor design goals for the
embedded, high-end and multimedia workstation markets. In this
dissertation, I will discuss three major data cache architecture design
optimization techniques, each of which exploits the data memory reference
characteristics of the applications written in high-level languages.
Through a better understanding of the memory reference behavior, we
can design a system that executes at higher performance, while consuming
less energy, and delivering more effective memory bandwidth.
The first part of this dissertation presents an in-depth characterization
of data memory references, including analysis of semantic region accesses
and behavior of data stores. This analysis leads to a new organization of
the data cache hierarchy called Region-based Cachelets. Region-based
Cachelets are capable of improving memory performance of
embedded applications while significantly reducing dynamic energy
consumption, resulting in a 50% to 70% improvement in energy-delay
product efficiency using this approach.
Following this, I will discuss a new cache-like structure, the Stack
Value File (or SVF), which boosts performance of general purpose
applications by routing stack data references to a separate storage
structure optimized for the unique characteristics of the stack
reference substream. By utilizing a custom structure for stack
references, we are able to increase memory level parallelism, reduce
memory latency, and reduce off-chip memory activity. The performance
can be improved by 24% by implementing an 8KB SVF for a processor
with a dual-ported L1 cache.
Finally, I will address memory bandwidth issues by proposing a new write
policy called Eager Writeback which can effectively improve overall
system performance by shifting the writings of dirty cache lines from
on-demand to times when the memory bus is less congested. It lessens
the criticality of on-demand misses and improves performance
by 6% to 16% for the 3D graphics geometry pipeline.
|We introduce a statistical model of music that allows for the retrieval of music based upon an audio input. This model uses frequency counts of state transitions to index pieces of music. Several methods of comparing these index values to choose an appropriate match are examined. We explained how this model can serve as the basis for a query-by-humming system. The model is also shown to be robust to errors of several kinds.
| The increasing popularity of SAT and BDD techniques in verification and
synthesis encourages the search for additional speed-ups. Since typical
SAT and BDD algorithms are exponential in the worst-case, the structure of
real-world instances is a natural source of improvements. While SAT and BDD
techniques are often presented as mutually exclusive alternatives, our work
points out that both can be improved via the use of the same structural
properties of instances. Our proposed methods are based on efficient
problem partitioning and can be easily applied as pre-processing with
arbitrary SAT solvers and BDD packages without any source code
Finding a better variable ordering is a well recognized problem for both
SAT solvers and BDD packages. Currently, all leading edge variable
ordering algorithms are dynamic, in the sense that they are invoked many
times in the course of the "host" algorithm that solves SAT or manipulates
BDDs. Examples include the DLCS ordering for SAT solvers and
variable-sifting during BDD manipulations. In this work we propose
a universal variable ordering MINCE (MIN Cut Etc.) that pre-processes
a given Boolean formula in CNF. MINCE is completely independent from target
algorithms and outperforms both DLCS for SAT and variable sifting for BDDs.
We argue that MINCE tends to capture structural properties of Boolean
functions arising from real-world applications. Our contribution is
validated on the ISCAS circuits and the DIMACS benchmarks. Empirically,
our technique often outperforms existing techniques by a factor of
two or more in most cases. Our results motivate search for stronger
dynamic ordering heuristics and combined static/dynamic techniques.
Technical Reports Page