Parallel Performance Project Research Paper

Research Paper

A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1
Eric L. Boyd, Waqar Azeem, Hsien-Hsin Lee, Tien-Pao Shih, Shih-Hao Hung, and Edward S. Davidson
Proceedings of the International Conference on Parallel Processing, August 94.

Abstract

We have developed a hierarchical performance bounding methodology that attempts to explain the performance of loop-dominated scientific applications on particular systems. The Kendall Square Research KSR1 is used as a running example. We model the throughput of key hardware units that are common bottlenecks in concurrent machines. The four units currently used are: memory port, floating-point, instruction issue, and a loop-carried dependence pseudo-unit. We propose a workload characterization, and derive upper bounds on the performance of specific machine-workload pairs. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable.

We delineate a comprehensive approach to modeling and improving application performance on the KSR1. Application of this approach is being automated for the KSR1 with a series of tools including K-MA and K-MACSTAT (which enable the calculation of the MACS hierarchy of performance bounds), K-Trace (which allows parallel code to be instrumented to produce a memory reference trace), and KPCache (which simulates inter-cache communications based on a memory reference trace).
Back to Publication List, or Parallel Performance Project Home Page