Parallel Performance Project Research Paper
Research Paper
-
KSR1 Multiprocessor: Analysis of Latency Hiding Techniques in a Sparse
Solver
Daniel Windheiser, Eric L. Boyd, Eric Hao, Santosh G. Abraham, and Edward
S. Davidson
Proceedings of the 7th International Parallel Processing Symposium,
pp 454-461, April 94.
Abstract
-
The KSR1 (Kendall Square Research) system is a shared-address space distributed-
cache architecture that is scalable to 1088 processors connected by a two-level
hierarchy of rings. This system has a hardware-coherent COMA (Cache-Only Memory
Architecture) organization where there is no main memory between the cache and
disk layers of the memory hierarchy. On a reference, data automatically migrates
to a processor's cache either from another cache or from disk. For a commercial
system, the KSR1 has unique architectural features for hiding communication
latency such as updating, prefetching, and poststoring. In this paper, we
present an experimental evaluation of the effect of these features on the
performance of a large production application. This work complements previous
simulation-based studies that have been done to investigate these features.
The KSR1 architecture is described, and performance measures for the processor,
primary cache, secondary cache and the ring interconnect are given. We then
describe the overall program structure of an iterative sparse matrix solver
application, the approach that was used to parallelize this application,
and the consequent data-sharing patterns between processors. Finally, we
present a detailed performance analysis of this application focusing on the
effectiveness of the COMA architecture, and the effectiveness of the prefetching
and poststoring instructions in reducing and hiding communication overhead.
Back to Publication List, or
Parallel Performance Project Home Page