CSE Technical Reports Sorted by Technical Report Number

TR Number

Title

Authors

Date

Pages

CSE-TR-439-01

The Chordal Analysis of Tonal Music

Pardo and Birmingham

Mar, 01

This paper provides a theoretical framework upon which to build a system for chordal analysis of music. We establish that, in the worst case, the partitioning and labeling of harmonies in tonal music is O(2^N), where N is number of notes in a piece of music. We show that, when segments of the music can be analyzed locally, and a segment scoring metric can be found that is both additive and transitive, the problem becomes O(N^2). Under these constraints, the problem is cast as finding the best path through a single-source directed acyclic graph, which can be solved in O(E + V) time. This translates to a worst-case time complexity of O(N^2) for our problem. We then show that the results of the O(N^2) search can be closely approximated through the use of a heuristic that allows O(N) time search. The results of the heuristic search are then compared to exhaustive graph search and the results of analyses by existing systems by Winograd (Winograd 1968), Maxwell (Maxwell 1992), and Temperley and Sleator (Temperley and Sleator 1999).

CSE-TR-440-01

The Effects of the x86 ISA on the Front End: Where have all the cycles gone?

Vlaovic and Davidson

Apr, 01

Although x86 processors have been around for a long time and are the most ubiquitous processors in the world, the amount of academic research regarding details of their performance has been minimal. Here, we will introduce x86 simulation environment, which we call Trace Analysis for X86 Interpretation, or TAXI, and use it to discuss the differences between current x86 processors and other processors, and present some performance results of eight Win32 applications. By utilizing TAXI, we can attribute performance bottlenecks to those components of the microarchitecture that cause the most performance degradation. We look at 8 aspects of front-end that can contribute to performance loss; then based on this information, we introduce an improvement that yields 17% speedup in overall execution time.

CSE-TR-441-01

Optimization and Evaluation Strategies for Decentralized Database Mining

Jensen and Soparkar

Jun, 01

CSE-TR-442-01

MASE: A Novel Infrastructure for Detailed Microarchitectural Modeling

Larson Chatterjee and Austin

Jul, 01

MASE (Micro Architectural Simulation Environment) is a novel infrastructure that provides a flexible and capable environment to model modern microarchitectures. Many popular simulators, such as SimpleScalar, are predominately trace-based where the performance simulator is driven by a trace of instructions read from a file or generated on-the-fly by a functional simulator. Trace-driven simulators are well-suited for oracle studies and provide a clean division between performance modeling and functional emulation. A major problem with this approach, however, is that it does not accurately model timing dependent computations, an increasing trend in microarchitecture designs such as those found in multiprocessor systems. MASE implements a micro-functional performance model that combines timing and functional components into a single core. In addition, MASE incorporates a trace-driven functional component used to implement oracle studies and check the results of instructions as they commit. The check feature reduces the burden of correctness on the micro-functional core and also serves as a powerful debugging aid. MASE also implements a callback scheduling interface to support resources with non-deterministic latencies such as those found in highly concurrent memory systems. MASE was built on top of the current version of SimpleScalar. Analyses show that the performance statistics are comparable without a significant increase in simulation time.

CSE-TR-443-01

Improving Energy and Performance of Data Cache Architectures by Exploiting Memory Reference Characteristics

Lee

Aug, 01

110

Minimizing power, increasing performance, and delivering effective memory bandwidth are todays primary microprocessor design goals for the embedded, high-end and multimedia workstation markets. In this dissertation, I will discuss three major data cache architecture design optimization techniques, each of which exploits the data memory reference characteristics of the applications written in high-level languages. Through a better understanding of the memory reference behavior, we can design a system that executes at higher performance, while consuming less energy, and delivering more effective memory bandwidth. The first part of this dissertation presents an in-depth characterization of data memory references, including analysis of semantic region accesses and behavior of data stores. This analysis leads to a new organization of the data cache hierarchy called Region-based Cachelets. Region-based Cachelets are capable of improving memory performance of embedded applications while significantly reducing dynamic energy consumption, resulting in a 50% to 70% improvement in energy-delay product efficiency using this approach. Following this, I will discuss a new cache-like structure, the Stack Value File (or SVF), which boosts performance of general purpose applications by routing stack data references to a separate storage structure optimized for the unique characteristics of the stack reference substream. By utilizing a custom structure for stack references, we are able to increase memory level parallelism, reduce memory latency, and reduce off-chip memory activity. The performance can be improved by 24% by implementing an 8KB SVF for a processor with a dual-ported L1 cache. Finally, I will address memory bandwidth issues by proposing a new write policy called Eager Writeback which can effectively improve overall system performance by shifting the writings of dirty cache lines from on-demand to times when the memory bus is less congested. It lessens the criticality of on-demand misses and improves performance by 6% to 16% for the 3D graphics geometry pipeline.

CSE-TR-444-01

Statistical Analysis in Music Information Retrieval

Birmingham and Rand

Oct, 01

We introduce a statistical model of music that allows for the retrieval of music based upon an audio input. This model uses frequency counts of state transitions to index pieces of music. Several methods of comparing these index values to choose an appropriate match are examined. We explained how this model can serve as the basis for a query-by-humming system. The model is also shown to be robust to errors of several kinds.

CSE-TR-445-01

Faster SAT and Smaller BDDs via Common Function Structure

Aloul Markov and Sakallah

Dec, 01

The increasing popularity of SAT and BDD techniques in verification and synthesis encourages the search for additional speed-ups. Since typical SAT and BDD algorithms are exponential in the worst-case, the structure of real-world instances is a natural source of improvements. While SAT and BDD techniques are often presented as mutually exclusive alternatives, our work points out that both can be improved via the use of the same structural properties of instances. Our proposed methods are based on efficient problem partitioning and can be easily applied as pre-processing with arbitrary SAT solvers and BDD packages without any source code modifications. Finding a better variable ordering is a well recognized problem for both SAT solvers and BDD packages. Currently, all leading edge variable ordering algorithms are dynamic, in the sense that they are invoked many times in the course of the "host" algorithm that solves SAT or manipulates BDDs. Examples include the DLCS ordering for SAT solvers and variable-sifting during BDD manipulations. In this work we propose a universal variable ordering MINCE (MIN Cut Etc.) that pre-processes a given Boolean formula in CNF. MINCE is completely independent from target algorithms and outperforms both DLCS for SAT and variable sifting for BDDs. We argue that MINCE tends to capture structural properties of Boolean functions arising from real-world applications. Our contribution is validated on the ISCAS circuits and the DIMACS benchmarks. Empirically, our technique often outperforms existing techniques by a factor of two or more in most cases. Our results motivate search for stronger dynamic ordering heuristics and combined static/dynamic techniques.

Technical Reports Page