Position Statement

Kai Li
Princeton

Microprocessor performance has been improving at the rate of 50% per year during the last 15 years. The CPU performance improvement curve has made several significant impacts in the computer industry during this period. Traditional vector-supercomputers have been replaced by microprocessor-based multicomputers and multiprocessors. Mainframe computers are using the same microprocessors as those in workstations and PCs. Traditional minicomputers have been squeezed out completely by the microprocessor-based mainframes and workstations from the computer market. The volume of PCs and notebook computers has reached 50 million in 1995. The difference between workstations and PCs are diminishing. These changes in the computer marketplace has also shifted the computer architecture and systems research focuses: maintaining the CPU performance curve and catching up the CPU performance curve.

How to use available transistors

The foundation of the performance curve has been the consistent advances in VLSI technology. Although the shrinking of features alone improves the performance by only about 25% each year, the increase in density has provided architects with many design opportunities to provide the other 25% per year. In the early 80s, a CPU chip has about 100,000 transistors. The architecture of the CPU was simple and it can fit into a single chip. Today, a CPU chip has about 10 million transistors. It is common to 4-issue superscalar architecture with out-of-order instruction execution. In addition to the CPU, part of memory hierarchy can such as two-level of caches can all be on the same chip. In the next few years, the density of CPU chips are expected to about 1 billion transistors. What are we going to do with 1 billion transistors? There will be a lot of interesting work such as increasing ILP, on-chip multiprocessing, and support for memory hierarchy and interconnect.

Support new, important applications

Another trend is that the cost of new VLSI fabrication facility has been increasing dramatically during the last decade. If this trend continues, the number of companies that can afford to produce CPU chips would be even further reduced. This trend calls for not only new, inexpensive VLSI technologies and but also justifications to maintain the CPU performance curve.

In the past few years, computer vendors continue trying hard to maintain the CPU performance curve for the sake of survival due to competitions. In order to maintain the CPU performance curve, there is a need to find demanding applications for the mass market. It may not be adequate to simply depend on major software vendors to slow down their software performance as they introduce unimportant features to their software.

Web related geographically distributed applications, high quality 3-D graphics will appear in the next generation applications for the mass and also requires tremendous CPU power. Architectural support for these applications and how to evaluate architectural research with new generations instead of SPEC benchmarks will be important.

Efficient Storage Hierarchy

The storage hierarchy between CPU registers and disks continues to be a fruitful research area. While the CPU performance curve is at the 50% per year, the DRAM performance improves at the rate of only about 10% per year during the last decade. The cost of a DRAM access was equivalent to over a hundred of CPU instructions today instead of a few instructions a decade ago. Furthermore, the disk access time improvement is at a rate even less than the DRAM's. To bridge the performance gap of the storage hierarchy, it is important to understand how to design better write-buffers, caches, memory systems, disk I/O subsystems, and interconnects.

High-Level Intergration

The trend in building scalable systems has been moving from low-level integration toward high-level integration during the last few years. While it is becoming commonplace to build supercomputers using off-the-shelf microprocessor chips instead of custom-designed processing elements, the industry is now building large-scale systems using PC and workstation boards (e.g. Convex uses HP workstation boards to build their MPP, and IBM uses their workstation boards to construct the SP2 multicomputers). Several research efforts are investigating how to use entire commodity workstation or PC systems and software to build scalable systems. The approach to using entire systems to construct scalable systems can track closely the exponential increase in all aspects of the technology base and reduce the cost of a multicomputer by an order of magnitude.

Interconnect technologies, both software and hardware, are the key to construct parallel architectures with microprocessors and commodity systems. Both latency and bandwidth are important. Researchers in this area have been working on improving both latency and bandwidth during the last decade. There are two ways to address the importance. One is the number of instructions per message. When the CPU performance is improving rapidly during the last decade, the number of instructions per message for a particular application is basically a constant. A high-performance parallel system are not only constructed with fast CPUs, but also with interconnects that can provide a low ratio of MIPs or Mflops to Mbytes/sec and low latency in CPU cycles.

Research areas to achieve high-level integrations are expected to be fruitful in the next few years.


Last modified: Wed June 3 13:21 EST 1996