Parallel Performance Project Research Paper

Research Paper

A Comparative Study of Cache-Coherent Nonuniform Memory Access Systems
Gheith A. Abandah and Edward S. Davidson
High Performance Computing Systems and Applications, Kluwer Academic Publishers,
In the 12th Ann. Int'l Symp. on High Performance Computing Systems and Applications (HPCS'98), pp 267-282, May 1998.

Abstract

We present a comparative study of three important CC-NUMA implementations, Stanford DASH, Convex SPP1000, and SGI Origin 2000, to find strengths and weaknesses of current implementations. Although the three systems share many similarities, they have significant differences that translate into large performance differences; e.g., number of processors per node, cache configuration, memory consistency model, location of memory in the node, and cache-coherence protocol. In this study, we evaluate the effects of these differences on cache misses, miss time, and local and internode traffic.

We first model the three systems according to their original parameters, and show that they have large performance differences due to using different component speeds and sizes. We then put the three systems on the same technological level by assigning them components of similar size and speed but preserve their organization and coherence protocol differences. Although the normalized Origin 2000 has the least average remote time, it spends the longest time satisfying its misses because most of them are remote. DASH's Illinois protocol and SPP1000's interconnect cache reduce their remote misses. The SPP1000 has the highest average remote time because its coherence protocol requires more signals to satisfy a miss than either of the other two protocols; DASH achieves lower miss time and its relaxed memory consistency model hides some of its miss time.
Back to Publication List, or Parallel Performance Project Home Page