1 Introduction
Over the last ten years, the rapid pace of technological innovation and restructuring of the computing market has substantially altered many assumptions underlying computer architecture research. In light of these changing assumptions, it is time for the computer architecture research community to re-think which research directions hold the most promise, and which hold the least.
This white paper presents my views on a few of the fundamental problems and opportunities in computer architecture research, and describes the research directions that I perceive are most promising.
2 Problems and Opportunities
In computer architecture research, problems are often synomynous with opportunities. The most striking example of this dicotomy is Moore's Law, which predicts that transistor densities will continue to increase at the same torrid pace (approximately quadrupling every three years). A consequence of Moore's law is that over a span as small as five years, "optimal" designs may become non-optimal and infeasible designs become both feasible and attractive. This rapid rate of technological change is both the greatest problem for computer architecture and the greatest opportunity for computer architects.
There are many corrollaries to Moore's Law, pertaining to different aspects of technological change. One of the most important is the relative performance of intra-chip and inter-chip communication. Intra-chip latencies and bandwidths are decreasing and increasing, respectively, at much greater rates than corresponding inter-chip performance. In conventional processor and memory designs, this trend leads to substantial increases in memory latency, relative to instruction execution, and corresponding decreases in bandwidth. New techniques that address this fundamental problem are needed.
Computers are becoming increasingly more complex and more highly integrated. This has several important consequences for academic computer architecture research. First, as computers become commodities, decreasing profit margins threaten to stifle diversity in commercial products. This in turn tends to decrease the impact of academic computer architecture research, as existing companies cannot afford to take the risks associated with innovation. Second, the trend towards greater complexity and higher integration raises the barrier-to-entry for small innovative computer companies. This again tends to decrease the industrial impact of academic computer architecture, by making it less feasible to directly commercialize academic innovations. Finally, it is becoming increasingly difficult for academia to develop and evaluate commerial or near-commercial quality systems. This is seen most strikingly in the difficulty and expense of prototyping high-performance uniprocessor and multiprocessor machines. Ten years ago, full-scale, full-speed prototypes were the norm; now they are the exception.
The trend towards high-performance commodity computers has one silver lining, however, because it enables parallel computers to be composed very simply, much like TinkerToys are composed to build larger toy structures. Parallel computing has long suffered two barriers to broad acceptance: expensive hardware and poor programmability. Commodity computers provides an opportunity to address the first of these problems, and help bring parallel computing one step closer to the main stream.
3 Research Areas with Greatest Potential
The research areas that I believe have greatest potential are those that address the problems and opportunities discussed in Section2. Although there are multiple ways to attack these problems, I focus on the approaches that I think have the most promise.
Integrating ILP and parallel processing
Increasing transistor densities is more often considered an opportunity, than a problem. One such opportunity occurs when the number of transistors per die nears 100 million, making multiple high-performance processors per chip both feasible and attractive. How these processors interact to solve a single problem is the key architectural problem. I believe the most promising approach is to integrate support for instruction level parallelism (ILP) and parallel programming. Conventional fine-grain approaches to instruction-level parallelism, e.g., superscalar, pipelining, and VLIW, are rapidly reaching their limits. Already, alternative techniques such as Multiscalar and simultaneous multithreading (SMT) are attempting to extract coarser-grain parallelism from a single-threaded application. I see these techniques, while interesting unto themselves, as simply the first step towards the integration of ILP and more traditional thread-based parallel processing. I envision this area being a fertile ground for the next ten years.
Reducing and tolerating memory latencies
Processor cycle times are increasing much more rapidly than interconnect and DRAM speeds, resulting in substantially greater memory latencies in conventional uniprocessor and multiprocessor designs. This is perhaps the most fundamental problem in conventional computer architecture, and hence deserves continued emphasis. There are sure to be evolutionary advances, such as better prefetching schemes, better cache placement and replacement schemes, and efficient micro-context switching schemes (e.g., SMT), as well as more revolutionary advances, such as processor in memory architectures.
Exploiting Parallelism
Parallel computing has long had two barriers to general acceptance: lack of cost-effective hardware and poor programmability. The former arises because the high cost of conventional massively parallel machines exceeds the resources of all but the elite research facilities. However, as high-performance workstations and SMP servers become commodities, they can be used as building blocks to form much larger parallel machines, fundamentally changing the economics of parallel processing.
Exactly what hardware should be built to interconnect these machines, however, depends upon the solution to the parallel programming problem. For traditional supercomputer applications, scientists and engineers are often willing to devote the human resources needed to overcome the poor programmer interfaces of these machines. However, parallel programming must become much simpler before parallel computing enters the mainstream. A necessary step is to define simple, portable programming interfaces that allow parallel programmers to preserve their software investment by facilitating porting an application to new machines. While parallel computing may never become simple, I am optimistic that over the next ten years it can become mainstream.
Novel approaches to evaluation
As computer systems become increasingly complex and expensive to design, I expect a further decrease in the role of conventional hardware prototyping-i.e., constructing a hardware replica of a proposed computer system. Already most architectural ideas are primarily evaluated through simulation, which provides much greater flexibility to evaluate the complete design space. Nonetheless, hardware prototypes often play a critical role in technology transfer, demonstrating to skeptical industry observers that innovations are ready for productization. To make such demonstrations possible, computer architects may need to follow the lead of our brethren in aeronautical engineering, and develop facilities and techniques for evaluating scale models of high-performance artifacts (e.g., wind tunnels). Advanced simulation systems, e.g., the Wisconsin Wind Tunnel and SimOS simulators, take this approach. However, none of these systems are able to efficiently simulate the current and emerging generations of dynamically scheduled processors. Hardware-assisted simulation, e.g., RPM, is one promising approach, although current implementations are extremely limited. I believe that for computer architecture to continue its successful record of technology transfer will require continued advances in evaluation technology.