Parallel Performance Project Research Paper

Research Paper

KSR1 Multiprocessor: Analysis of Latency Hiding Techniques in a Sparse Solver
Daniel Windheiser, Eric L. Boyd, Eric Hao, Santosh G. Abraham, and Edward S. Davidson
Proceedings of the 7th International Parallel Processing Symposium, pp 454-461, April 94.

Abstract

The KSR1 (Kendall Square Research) system is a shared-address space distributed- cache architecture that is scalable to 1088 processors connected by a two-level hierarchy of rings. This system has a hardware-coherent COMA (Cache-Only Memory Architecture) organization where there is no main memory between the cache and disk layers of the memory hierarchy. On a reference, data automatically migrates to a processor's cache either from another cache or from disk. For a commercial system, the KSR1 has unique architectural features for hiding communication latency such as updating, prefetching, and poststoring. In this paper, we present an experimental evaluation of the effect of these features on the performance of a large production application. This work complements previous simulation-based studies that have been done to investigate these features.

The KSR1 architecture is described, and performance measures for the processor, primary cache, secondary cache and the ring interconnect are given. We then describe the overall program structure of an iterative sparse matrix solver application, the approach that was used to parallelize this application, and the consequent data-sharing patterns between processors. Finally, we present a detailed performance analysis of this application focusing on the effectiveness of the COMA architecture, and the effectiveness of the prefetching and poststoring instructions in reducing and hiding communication overhead.
Back to Publication List, or Parallel Performance Project Home Page