Position Statement

David Nagel
CMU

No Title

This is a very exciting time for computer architects. Looking back on the last 30 years of work, we see semiconductor technology enabling the implementation of ideas that were previously beyond technology's grasp. Ideas like multi-level memory hierarchies, pipelining, multiple issue, out-of-order issue, speculative execution, VLIW and a bevy of other architectural innovations are becoming commonplace in even the most basic of processor designs. Architecture is also facing a new set of challenges: what to do with the billions of transistors that will soon fit on a single die? Unlike previous generations, where attention was focused on the size of hardware components (e.g. caches, multiple functional units), today's architect is spending more time focusing on what types of computational cores, interconnect and memory elements can most effectively exploit the abundance of silicon.

Equally exciting and important are the tremendous changes in software and its influence on how architects design processors and computing systems. In contrast to the past, when programs spent most of their execution time running on a stand-alone computer, today's "simple" applications can span multiple processors, co-processors (e.g., DSPs), memory hierarchies, networks, and storage devices. For example, an increasingly common application, surfing the net, involves the client's 1) Web Browser application, 2) operating system, 3) disk and 4) network interfaces - the networks: 5) LAN and 6) WAN networking hardware - and the server's: 7) storage and 8) network interfaces, 9) operating system and 10) Web Server software. Improving the performance of these applications is not only a matter of improving the cache miss or branch prediction rate of a client's CPU, it involves carefully architecting the system -- a system where at each step in the path one will find a processor exec! uting some part of the application.

I believe that these hardware and software trends are generating a wealth of important research problems. Continued work in areas such as memory hierarchies, branch prediction, ILP, and compilers is needed to continue the 60% per year growth in uniprocessor performance we have been delivering over the last 10 years. The compiler/architecture interface faces new challenges as designers push inter- and intra-task ILP. Parallel processing is also gaining momentum as the commodity CPU market enables large-scale multiprocessor systems while practically guaranteeing the delivery of small-scale desktop multiprocessors, purchased over-the-phone and delivered overnight to your doorstep.

Application trends such as surfing the web and digital libraries, coupled with the commodity hardware market, pose several other interesting areas of research for architects. The first, I've nicknamed Heterogeneous, Distributed, Parallel Processing, Ubiquous Computing System (HDPPUCS - for lack of a longer acronym :), examines how processors can be built and used to further improve the performance of large-scale, distributed computing systems. At first glance, this may appear to be a problem for other types of research areas. However, the fact that processors are embedded in almost every part of the computing infrastructure is an opportunity we should not overlook.

For example, networking research is placing processors in every part of the network from routers and switches to network adaptor cards. Likewise, storage systems integrate processors into both disk drives and subsystems (e.g., RAID). Processors are finding their way into the infrastructure because they are the only way these systems can keep up with the demands of software. Architects need to explore processor designs for these types software and hardware systems. Architecture research can also examine ways of exploiting the computational wealth of HDPPUCS. For years CPU's have been I/O bound, starving for data to arrive from memory, the network or storage. As processors become increasingly inexpensive, migrating processing to the data, especially specialized functionality that reduces the system-level load, might help improve overall system performance. Here too the types of processor architectures that can be realistically and cost-effectively deployed remains an open que! stion.

There is also the notion of HPPCS-on-a-chip (drop the Distributed part of HDPPCS). With the potential of billions of transistors, proposals for a wide variety of compute engines all on a single chip are emerging. Future chips might include a set of processor cores, several different decode engines, specialized graphics co-processors, encryption/decryption processors, network protocol processors, compression processors, ... and a chunk of FPGA-like fabric that can be configured on-the-fly. Some of the open questions include which types of processing elements should we use and for what types of applications.

As we move to the system-on-a-chip solution, the choice of interfaces that processing elements should provide becomes an important question. Today's interfaces make it relatively difficult for designers to connect computational cores without serious hand-crafting of the system. If research ideas such as Intelligent RAM (IRAM) and reconfigurable computing begin to emerge, it will be important for architects to consider how to interface a diverse set of components.

Even today, processor interfaces significantly hinder software's performance and reliability. Consider a processor's virtual memory and the protection mechanisms/policies it exports to the operating system. Currently, every commercial processor provides a unique (often radically different) view of VM and protection. This not only makes it difficult for system software (e.g., operating systems) to support multiple architectures, but the overloaded use of VM with protection and the inflexible protection mechanisms limits software. Operating systems often recast hardware policies, trying to overcome the deficiencies by creating a more flexible set of software-based policies. Unfortunately, these attempts almost always decrease performance beyond the level of what is acceptable to the user.

A more flexible protection mechanism might simplify and/or create more powerful software systems. Many OS research projects in the 1980's examined how micro-kernel operating systems could provide a more robust and flexible software base. However, the inherent cost of an inflexible set of protection mechanisms and policies made the performance cost too great. Responding to this result, OS research has reversed itself and is now avoiding hardware protection boundaries - turning to software for protection (e.g., typesafe languages, SFI). Unfortunately, these software protection mechanisms incur a cost that a more carefully design set of hardware protection mechanisms could eliminate.

Providing a more flexible set of VM mechanisms could also lead to significant improvements in software's performance, reliability and complexity. For example, set-top, embedded and game-based operating systems treat memory as a critical resource, trying to minimize wasted space. The ability to create small page sizes (< 4K) could reduce the amount of memory these systems require -- or reduce the extra software written to avoid the problems created by processor-defined large pages.

Processors also present a unidirectional interface to application programs: applications talk and the processor listens. What if the processor could talk back to the application? Let the application know how things are going. Tell the application its concerns, its problems. Could the application make use of this information? Could the application use its knowledge to help the system work more efficiently.

Many processors already provide some form of feedback using on-chip performance counters. These course-grained probes provide a great deal of information about the behavior of an application. However, it is usually difficult to extract the information in a way that would benefit the application. Architectures need to begin exposing more of the behavior of the processor (and memory system) to the application. Processors also need to allow the application more opportunity to control the hardware structures. Compilers need to learn how a program utilizes the memory system, feeding that information back into subsequent compiles. Perhaps applications could help the processor select the best branch-prediction algorithm(s) or cache management strategies. In SMT (Simultaneous Multithreading) machines, it might be possible for threads to reserve portions of the memory hierarchy, allowing two coordinated threads to share information, or for higher-level software to detect when certai! n threads interact poorly. Throughout the system, there are opportunities for software to advise, learn and modify their behavior to improve performance.

In summary, I believe there are many areas of architecture research where a synergy between software and hardware could dramatically improve system-level performance. Technology is rapidly moving far beyond the current levels research, opening doors for on-chip and system-level architectural innovation and creating a wide range of architectural problems and possibilities. The trick is to make sure that technology's advance doesn't obsolete research before it is complete.


Last modified: Wed June 3 13:21 EST 1996