Parallel computer systems offer almost unlimited peak performance capability with very high performance/cost. However, delivering more than a tiny fraction of the potential hardware performance today remains a highly costly task, requiring labor-intensive tuning of application codes to each specific architecture.
The objective of our work is to develop and implement automated means of assessing the potential performance of applications on targeted supercomputers, high performance workstations and parallel computer systems, identifying the causes of degradations in delivered performance, and restructuring the application codes and their data structures so that they can achieve their full performance potential.
Toward this end, performance models have been developed for a broad range of high performance computer systems of interest, including the Cray vector machines, Convex C2, IBM RS-6000, Digital Alpha, Hewlett-Packard PA-RISC, and Kendall Square KSR1. These models have been used to characterize problematic application codes that are provided by our industry partners and university colleagues in the Center for Parallel Computation. They yield a hierarchy of bounds on achievable performance that identify specific causes of performance degradation and the sections of the code and data structures where they occur. The performance gaps exposed by the bounds hierarchy can then be closed through the effective, explicitly targeted and goal-directed use of appropriate automated restructuring techniques, including domain decomposition, load balancing, relaxed synchronization, loop and data restructuring, locality enhancements, and fine-grain instruction scheduling.
Some of our current research effort is devoted to extending and automating the performance bound models and restructuring techniques to integrate them into a comprehensive, highly automated methodology for evaluating the performance of alternative architectures and compilers on specific codes. The objective of this comparative work is to determine effective sets of architectural features and compiler features that exploit them in order to define the high performance, highly concurrent computer systems and software development environments of the future.