Parallel Performance Project Research Paper
Research Paper
-
Data and Program Restructuring of Irregular Applications for Cache-Coherent
Multiprocessors
Karen A. Tomko and Santosh G. Abraham
Proceedings of the International Conference on Supercomputing,
pp 214-225, July 94.
Abstract
-
Applications with irregular data structures such as sparse matrices or
finite element meshes account for a large fraction of engineering and
scientific applications. Domain decomposition techniques are commonly
used to partition these applications to reduce interprocessor
communication on message passing parallel systems. Our work
investigates the use of domain decomposition techniques on
cache-coherent parallel systems.
Many good domain decomposition algorithms are now available. We show
that further application improvements are attainable using data and
program restructuring in conjunction with domain decomposition. We
give techniques for data layout to reduce communication, blocking with
subdomains to improve uniprocessor cache behavior, and insertion of
prefetches to hide the latency of interprocessor communication.
This paper details our restructuring techniques and provides
experimental results on the KSR1 multiprocessor for a sparse matrix
application. The experimental results include counts of cache misses
provided by the KSR PMON performance monitoring tool. Our data show
that cache coherency traffic can be reduced by 30\%-60\% using our
data layout scheme and that more than 53\% of the remaining coherency
cache misses can be eliminated using prefetch instructions.
Back to Publication List, or
Parallel Performance Project Home Page