Parallel Performance Project Research Paper
Research Paper
-
Analysis of Memory Latency Factors and their Impact on KSR1
MPP Performance
Bassam Kahhaleh
Technical Report CSE-TR-159-93 University of Michigan, April 93,
Abstract
-
The Kendall Square Research KSR1 MPP system has a shared address space, which
spreads over physically distributed memory modules. Thus, memory access time
can vary over a wide range even when accessing the same variable, depending on
how this variable is being referenced and updated by the various processors.
Since the processor stalls during this access time, the KSR1 performance
depends considerably on the program's locality of reference. The KSR1 provides
two novel features to reduce such long memory latencies: prefetch and post-
store instructions. This paper analyzes the various memory latency factors
which stalls the processor during program execution. A suitable model for
evaluating these factors is developed for the execution of FORTRAN DO-loops
parallelized with the Tile construct using the Slice strategy. The DO-loops
used in the benchmark program perform sparse matrix-vector multiply, vector-
vector dot product, and vector-vector addition, which are typically executed
in an iterative sparse solver. Memory references generated by such loops are
analyzed and their memory latencies are experimentally evaluated. Thus, the
performance of the KSR1 and its unique memory system is determined.
Furthermore, the prefetch and post-store operations are evaluated and their
effects on performance and memory latencies are determined. The limited size
of the prefetch queue is shown to stall the processor for a long period of
time, which reduces the benefit of prefetch considerably. The post-store
operation is evaluated with two placements: immediate and delayed post-store.
In both cases, the post-store operation has a high overhead. However, it is
shown that delaying the post-store operation improved performance considerably.
Back to Publication List, or
Parallel Performance Project Home Page