Parallel Performance Project Research Papers

The research being performed by the Parallel Performance Project is subdivided in the following catagories:

Performance Evaluation

Origin 2000 Design Enhancements for Communication Intensive Applications
Gheith A. Abandah and Edward S. Davidson
Proceedings of the Int'l Conference on Parallel Architecture and Compilation Techniques (PACT'98), pp 30-39, Oct 1998.
Abstract
Reducing Communication Cost in Scalable Shared Memory Systems
Gheith A. Abandah,
Ph.D. Thesis (Technical Report CSE-TR-362-98), University of Michigan, 1998.
Abstract
Effects of Architectural and Technological Advances on the HP/Convex Exemplar's Memory and Communication Performance
Gheith A. Abandah and Edward S. Davidson
Proceedings of the 25th International Symposium on Computer Architecture (ISCA'98), pp 318-329, June 1998.
Abstract
A Comparative Study of Cache-Coherent Nonuniform Memory Access Systems
Gheith A. Abandah and Edward S. Davidson
High Performance Computing Systems and Applications, Kluwer Academic Publishers,
In the 12th Ann. Int'l Symp. on High Performance Computing Systems and Applications (HPCS'98), pp 267-282, May 1998.
Abstract
Configuration Independent Analysis for Characterizing Shared-Memory Applications
Gheith A. Abandah and Edward S. Davidson
Proceedings of the 12th International Parallel Processing Symposium (IPPS'98), pp 485-491, March 1998.
Abstract
Configuration Independent Analysis for Characterizing Shared-Memory Applications
Gheith A. Abandah and Edward S. Davidson
Detailed Technical Report CSE-TR-357-98, University of Michigan, Jan 1998.
Abstract
Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP1000
Gheith A. Abandah and Edward S. Davidson
The IEEE Trans. on Parallel and Distributed Systems, Vol. 9, No. 2, pp 206-216, Feb 1998.
Abstract
Characterizing Shared-Memory Applications: A Case Study of the NAS Parallel Benchmarks
Gheith A. Abandah
Technical Report HPL-97-24, HP Laboratories, Jan 1997.
Abstract
Tools for Characterizing Distributed Shared Memory Applications
Gheith A. Abandah
Technical Report HPL-96-157, HP Laboratories, Dec 1996.
Abstract
Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-1000
Gheith A. Abandah and Edward S. Davidson
Technical Report CSE-TR-277-96, University of Michigan, Jan 1996.
Abstract
Modeling the Communication Performance of the IBM SP2
Gheith A. Abandah and Edward S. Davidson
Proceedings of the 10th International Parallel Processing Symposium (IPPS'96), pp 249-257, April 1996.
Abstract
Performance Evaluation and Improvement of Parallel Application on High Performance Architectures
Eric L. Boyd
Ph.D. Thesis, University of Michigan.
Modeling the Communication and Computation Performance of the IBM SP2
Gheith A. Abandah and Edward S. Davidson
Technical Report CSE-TR-258-95, University of Michigan, May 95.
Abstract
Modeling Computation and Communication Performance of Parallel Scientific Applications: A Case Study of the IBM SP2
Eric L. Boyd, Gheith A. Abandah, Hsien-Hsin Lee, and Edward S. Davidson
Technical Report CSE-TR-236-95, University of Michigan, May 95.
Abstract
A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1
Eric L. Boyd, Waqar Azeem, Hsien-Hsin Lee, Tien-Pao Shih, Shih-Hao Hung, and Edward S. Davidson
Proceedings of the International Conference on Parallel Processing, August 94.
Abstract
Communication in the KSR1 MPP: Performance Evaluation Using Synthetic Workload Experiments
Eric L. Boyd and Edward S. Davidson
Proceedings of the International Conference on Supercomputing, pp 166-175.
Abstract
Evaluating the Communication Performance of MPPs Using Synthetic Sparse Matrix Multiplication Workloads
Eric L. Boyd, John-David Wellman,Santosh G. Abraham, and Edward S. Davidson
Proceedings of the International Conference on Supercomputing, November 93.
Abstract
Approaching a Machine-Application Bound in Delivered Performance on Scientific Code
William Mangione-Smith, Tien-Pao Shih, Santosh G. Abraham, and Edward S. Davidson
Special Issue of IEEE Proceedings on Computer Performance Analysis, August 93.
Abstract
Modeling and Approaching the Deliverable Performance Capability of the KSR1 Processor
Waqar Azeem
Technical Report CSE-TR-164-93, University of Michigan, June 93
Abstract
Hierarchical Performance Modeling with MACS: A Case Study of the Convex C-240
Eric L. Boyd and Edward S. Davidson
Proceedings of the 20th International Symposium on Computer Architecture, pp 203-212, May 93.
Abstract
KSR1 Multiprocessor: Analysis of Latency Hiding Techniques in a Sparse Solver
Daniel Windheiser, Eric L. Boyd, Eric Hao, Santosh G. Abraham, and Edward S. Davidson
Proceedings of the 7th International Parallel Processing Symposium, pp 454-461, April 93.
Abstract
Analysis of Memory Latency Factors and their Impact on KSR1 MPP Performance
Bassam Kahhaleh
Technical Report CSE-TR-159-93 University of Michigan, April 93,
Abstract
Performance Bound and Buffer Space Requirements for Concurrent Processors
William H. Mangione-Smith
Ph.D. Thesis (Technical Report CSE-TR-129-92), University of Michigan, 92.
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
William H. Mangione-Smith, Santosh G. Abraham, and Edward S. Davidson
IEEE Computer, Vol 24(1), pp 39-46, January 91.
Architectural vs. Delivered Performance of the IBM RS/6000 and the Astronautics ZS-1
William H. Mangione-Smith, Santosh G. Abraham, and Edward S. Davidson
Proc. Twenty-Fourth Hawaii International Conference on System Sciences, pp 397-408, January 91.

Domain Decomposition & Synchronization

Profile Driven Weighted Decomposition
Karen A. Tomko and Edward S. Davidson
Proceedings of the 1996 ACM International Conference on Supercomputing, May 1996, Philadelphia.
Abstract
Domain Decomposition, Irregular Application, and Parallel Computers
Karen A. Tomko
Ph.D. Thesis, University of Michigan, 1995.
Abstract
Impact of Load Imbalance on the Design of Software barriers
A. E. Eichenberger and S. G. Abraham
Proceedings of the 1995 International Conference on Parallel Processing, Vol II, pp 63-72, August 95.
Abstract
Modeling Load Imbalance and Fuzzy Barriers for Scalable Shared-Memory Multiprocessors
A. E. Eichenberger and S. G. Abraham
Proceeding of the 28th Hawaii International Conference on System Sciences, pp 262-271, January 95.
Abstract

Cache Optimization & Data Layout

A Prefetch Taxonomy
Viji Srinivasan, Edward S. Davidson, and Gary S. Tyson

Branch History Guided Instruction Prefetching
Viji Srinivasan, Edward S. Davidson, Gary S. Tyson, Mark J. Charney, and Thomas R Puzak
Proceedings of the 7th International Conference on High Performance Computer Architecture (HPCA), January 2001, pp. 291-300

Recovering Singe Cycle of Primary Caches
Viji Srinivasan, Edward S. Davidson, and Gary S. Tyson

Active Management of Data Caches by Exploiting Reuse Information
Edward S. Tam, Jude A. Rivers, Vijayalakshmi Srinivasan, Gary S. Tyson, and Edward S. Davidson
IEEE Transactions on Computers, Vol. 48, No. 11, pp. 1244 - 1259, November 1999.
Abstract
Improving Cache Performance Via Active Management
Edward S. Tam
Ph.D. Thesis, University of Michigan, June 1999.
Abstract
Evaluating the Performance of Active Cache Management Schemes
Edward S. Tam, Jude A. Rivers, Vijayalakshmi Srinivasan, Gary S. Tyson, and Edward S. Davidson
Proceedings of the 1998 International Conference on Computer Design, October 1998.
Abstract
Utilizing Reuse Information in Data Cache Management
Jude A. Rivers, Edward S. Tam, Gary S. Tyson, Edward S. Davidson, and Matt Farrens
Proceedings of the 12th ACM International Conference on Supercomputing, July, 1998
Abstract
mlcache: A Flexible Multi-Lateral Cache Simulator
Edward S. Tam, Jude A. Rivers, Gary S. Tyson, and Edward S. Davidson Proceedings of the 6th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '98), July, 1998
Abstract
mlcache: A Flexible Multi-Lateral Cache Simulator
Edward S. Tam, Jude A. Rivers, Gary S. Tyson, and Edward S. Davidson
Technical Report CSE-TR-363-98, University of Michigan, May, 1998
Abstract
Improving Performance of an L1 Cache With an Associated Buffer
Vijayalakshmi Srinivasan and Edward S. Davidson
Technical Report CSE-TR-361-98, University of Michigan, March, 1998
Abstract
On High-Bandwidth Data Cache Design for Multi-Issue Processors
Jude A. Rivers, Gary S. Tyson, Todd M. Austin, and Edward S. Davidson
Proceedings of the 30th IEEE/ACM International Symposium on Microarchitecture, December 1997
Abstract
Flexible Timing Simulation of Multiple-Cache Configurations
Edward S. Tam, Jude A. Rivers, and Edward S. Davidson
Technical Report CSE-TR-348-97, University of Michigan, November, 1997
Abstract
On Effective Data Supply for Multi-Issue Processors
Jude A. Rivers, Edward S. Tam and Edward S. Davidson
Proceedings of the 1997 International Conference on Computer Design, October 1997.
Abstract
Performance Issues in Integrating Temporality-Based Caching with Prefetching
Jude A. Rivers and Edward S. Davidson
Proceedings of the IFIP WG7.3 Int'l Conf. on Performance Theory, Measurement and Evaluation of Computer and Communication Systems (Performance'96), October 1996. Also appears as Performance Evaluation Vol 27&28 (1996) pp 1-19.
Abstract
Early Design Cycle Timing Simulation of Caches
Edward S. Tam and Edward S. Davidson
Technical Report CSE-TR-317-96, University of Michigan, November, 1996.
Abstract
Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design
Jude A. Rivers and Edward S. Davidson
Proceedings of the 1996 International Conference on Parallel Processing, Vol I, pp 151-162, August 1996.
Abstract
Goal-Directed Performance Tuning for Scientific Applications
Tien-Pao Shih
Ph.D. Thesis, University of Michigan, May 96.
Abstract
Grouping Array Layouts to Reduce Communication and Improve Locality of Parallel Program
Tien-Pao Shih and Edward S. Davidson
Proceedings of the 1994 International Conference on Parallel and Distributed Systems, pp 558-566, December 94.
Abstract
Data and Program Restructuring of Irregular Applications for Cache-Coherent Multiprocessors
Karen A. Tomko and Santosh G. Abraham
Proceedings of the International Conference on Supercomputing, pp 214-225, July 94.
Abstract
Multi-Configuration Simulation Algorithms for the Evaluation of Computer Architecute Designs
Rabin A. Sugummar
Ph.D. Thesis (Technical Report CSE-TR-173-93), University of Michigan, August 93.
Abstract
Efficient Simulation of Caches under Optimal Replacement with Applications to Miss Characterization
Rabin A. Sugumar and Santosh G. Abraham
Proceedings of the ACM SIGMETRICS Conference, pp 24-35, 93
Abstract
Predictability of Load/Store Instruction Latencies
Santosh G. Abraham, Rabin A. Sugumar, Daniel Windheiser, B. R. Rau and Rajiv Gupta
Proceedings of the 26th Annual International Symposium on Microarchitecture, November 93.
Abstract
Efficient Simulation of Multiple Cache Configurations using Binomial Trees
Rabin A. Sugumar and Santosh G. Abraham
Technical Report CSE-TR-111-91, University of Michigan, 91.
Abstract

Instruction Scheduling & Register Allocation

Efficient Formulation for Optimal Modulo Schedulers
Alexandre E. Eichenberger and Edward S. Davidson
Proceedings of the Conference on Programming Language Design and Implementation, June 97.
Abstract
Modulo Scheduling, Machine Representations, and Register-Sensitive Algorithms
Alexandre E. Eichenberger
Ph.D. Thesis, University of Michigan, Dec. 96.
Abstract, postscript, compressed(gz), compressed(Z), 2-per-page-compressed(gz).
A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints
Alexandre E. Eichenberger and Edward S. Davidson
Proceedings of the Conference on Programming Language Design and Implementation, May 96.
Abstract
Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling
A. E. Eichenberger, E. S. Davidson, and S. G. Abraham
International Journal of Parallel Programming, Vol 2 (2), pp 103-132, April, 1996
Abstract
Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule
A. E. Eichenberger and E. S. Davidson
Proceedings of the 28th Annual International Symposium on Microarchitecture, pp 338-349, November 95.
Abstract
Register Allocation for Predicated Code
A. E. Eichenberger and E. S. Davidson
Proceedings of the 28th Annual International Symposium on Microarchitecture, pp 180-191, November 95.
Abstract
Optimal Dual-Issue Instruction Scheduling With Spills for Binary Expression Trees
W. Meleis and E. S. Davidson
Technical Report CSE-TR-261-95, University of Michigan, 95.
Abstract
A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints
A. E. Eichenberger and E. S. Davidson
Technical Report CSE-TR-266-95, University of Michigan, October 95.
Abstract
Optimum Modulo Schedules for Minimum Register Requirements
A. E. Eichenberger, E. S. Davidson, and S. G. Abraham
Proceedings of the 1995 International Conference on Supercomputing, pp 31-40, July 95.
Abstract
Minimum Register Requirements for a Modulo Schedule
A. E. Eichenberger, S. G. Abraham, and E. S. Davidson
Proceedings of the 27th Annual International Symposium on Microarchitecture, pp 75-84, November 94.
Abstract
Optimal Local Register Allocation for a Multiple-Issue Machine
Waleed M. Meleis and Edward S. Davidson
Proceedings of the International Conference on Supercomputing, pp 107-116, July 94.
Abstract

Microprocessor Architecture & Pipeline Design

Processor Modeling and Evaluation Techniques for Early Design Stage Performance Comparison
John-David Wellman
Ph. D. Thesis, University of Michigan, Oct. 1996.
Abstract (Thesis is 330 pages, 2.2 Mbytes of data)
The Resource Conflict methodology for Early-Stage Design Space Exploration of Superscalar RISC Processors
J-D Wellman and E. S. Davidson
Proceedings of the 1995 International Conference on Computer Design, pp 110-115, Oct 2-4, 1995.
Performance Optimization of Pipeline Circuits with Latches and Wave Pipelining
Chuan-Hua Chang
Ph.D. Dissertation, EECS, University of Michigan, March, 1996
Abstract
Maximum Rate Single-Phase Clocking of a Closed Pipeline including Wave Pipelining, Stoppability, and Startability
C.-H. Chang, E. S. Davidson and K. A. Sakallah
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 14, no. 12, pp 1526-1545, December 1995
Abstract
Delay Balancing using Latches
C.-H. Chang and E. S. Davidson
Digest of Papers, Tau'95: 1995 ACM International Workshop on Timing issues in the Specification and Synthesis of Digital Systems, pp 66-73, Nov. 1995
Abstract
Using Constraint Geometry to Determine Maximum Rate Pipeline Clocking
C.-H. Chang, E. S. Davidson and K. A. Sakallah
Digest of Technical Papers, IEEE International Conference on Computer-Aided Design (ICCAD), pp 142-148, Nov. 1992.
Abstract


Help to add a new paper
Back to Parallel Performance Project Home Page