Parallel Performance Project Research Paper

Research Paper

Analysis of Memory Latency Factors and their Impact on KSR1 MPP Performance
Bassam Kahhaleh
Technical Report CSE-TR-159-93 University of Michigan, April 93,

Abstract

The Kendall Square Research KSR1 MPP system has a shared address space, which spreads over physically distributed memory modules. Thus, memory access time can vary over a wide range even when accessing the same variable, depending on how this variable is being referenced and updated by the various processors. Since the processor stalls during this access time, the KSR1 performance depends considerably on the program's locality of reference. The KSR1 provides two novel features to reduce such long memory latencies: prefetch and post- store instructions. This paper analyzes the various memory latency factors which stalls the processor during program execution. A suitable model for evaluating these factors is developed for the execution of FORTRAN DO-loops parallelized with the Tile construct using the Slice strategy. The DO-loops used in the benchmark program perform sparse matrix-vector multiply, vector- vector dot product, and vector-vector addition, which are typically executed in an iterative sparse solver. Memory references generated by such loops are analyzed and their memory latencies are experimentally evaluated. Thus, the performance of the KSR1 and its unique memory system is determined. Furthermore, the prefetch and post-store operations are evaluated and their effects on performance and memory latencies are determined. The limited size of the prefetch queue is shown to stall the processor for a long period of time, which reduces the benefit of prefetch considerably. The post-store operation is evaluated with two placements: immediate and delayed post-store. In both cases, the post-store operation has a high overhead. However, it is shown that delaying the post-store operation improved performance considerably.
Back to Publication List, or Parallel Performance Project Home Page