Parallel Performance Project Research Paper

Research Paper

Improving Cache Performance Via Active Management
Edward S. Tam
Ph.D. Thesis, University of Michigan, June 1999.

Abstract

This dissertation analyzes a way to improve cache performance via active management of a target cache space. As microprocessor speeds continue to grow faster than memory subsystem speeds, minimizing the average data access time grows in importance. As current data caches are often poorly and inefficiently managed, a good management technique can improve the average data access time. Cache management involves two main processes: block allocation decisions and block replacement decisions. Active block allocation can be performed most efficiently in multilateral caches (several distinct data stores with disjoint contents placed in parallel within L1), where blocks exhibiting particular characteristics can be placed in the appropriate store.

To aid in our evaluation of different active block management schemes, we have developed a multilateral cache simulator, mlcache, which provides a platform whereby different cache schemes can easily be specified, and produces evaluation statistics that can help explain their performance. Using mlcache, we have been able to evaluate the performance of proposed multilateral cache schemes and to derive new, better performing schemes. Our results show that multilateral schemes outperform traditional caches of similar size and often rival the performance of traditional caches nearly twice as large. However, the performance difference between previously-proposed implementable schemes and a multilateral configuration that uses a non-implementable near-optimal replacement policy is large. This disparity is due mainly to the simple prediction strategies presently used in the implementable schemes, along with their limited management of blocks while resident in the L1 cache structure.

We introduce a new multilateral allocation scheme, Allocation By Conflict (ABC), which outperforms all previously proposed reuse-based multilateral configurations and performs comparably to multilateral schemes that have significantly more hardware requirements (particularly Victim, which requires a data path between its A and B caches). The ABC scheme incurs the lowest hardware cost of any of the proposed multilateral schemes, yet it performs the highest and is the most easily implementable. The ABC scheme requires the addition of only a single additional bit per block in cache A and a very simple logic circuit for making the allocation decisions. The ABC scheme's performance advantage also scales well as the sizes of the caches are increased and as the associativity of the A cache is increased.

Back to Publication List, or Parallel Performance Project Home Page