Parallel Performance Project Research Paper

Research Paper

Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design
Jude A. Rivers and Edward S. Davidson
Proceedings of the 1996 International Conference on Parallel Processing, Vol I, pp 151-162, August 96.

Abstract

Direct-mapped caches are often plagued by conflict misses because they lack the associativity to store more than one memory block in each set. However, some blocks that have no temporal locality actually cause program execution degradation by displacing blocks that do manifest temporal behavior. In this paper, we present a simple but efficient novel hardware design called the Non-Temporal Streaming (NTS) Cache that supplements the conventional direct-mapped cache with a parallel fully associative buffer. Every cache block loaded into the main cache is monitored for temporal behavior by a hardware detection unit. Cache blocks identified as nontemporal are allocated to the buffer on subsequent requests. Our simulations show that the NTS Cache not only provides a performance improvement over the conventional direct-mapped cache, but can also save on-chip area. For some numerical programs like FFTPDE, APPSP and APPBT from the NAS benchmark suite, an integral NTS Cache of size 9KB (i.e., 8KB direct-mapped cache plus 1KB NT buffer) performs as well as a 16KB conventional direct-mapped cache.
Back to Publication List, or Parallel Performance Project Home Page