Position Statement

Dirk Grunwald
University of Colorado, Boulder

Clarification

Clearly, there's no single use for computers, and thus no single computer architecture will become dominant. I believe there will be distinct research topics that address three market segments for computer applications.

  1. Embedded systems
  2. General-purpose application-oriented systems (e.g., PC's)
  3. Computationally-intensive systems

The applications are somewhat distinct, and their needs are distinct. Each has amble research opportunities.

Embedded Systems

Embedded systems will continue to focus on lowered part-count and reduced power. Systems like the StrongARM are beginning to appear in "set top" boxes, but the majority of embedded processors are much simpler, using e.g., a Motorola 68xx or Intel x86 core with sufficient on-chip memory. Reducing parts count occur with large memory integration, but beyond a certain point (probably 2-8 Mbytes), looking at integrated systems that combine network controllers (USB?) or display controllers will be more profitable. I don't think there is much University-directed computer architecture research in this area, because it's a very product-oriented market niche.

General Purpose Systems

The issues in this market niche are reduced parts-count, power and performance. I'll discuss power and application-specific performance.

Up to now, computer architects have largely avoided issues in power management. However, this is increasingly important, particularly for embedded systems and general-purpose application-oriented systems. However, architects need to have models to understand the ramifications of power use in the systems they design. Reduced power demands present interesting design tradeoffs - for example, does it cost more power to drive multi-associative caches? Is the benefit of the associative cache not worth that power? We have no models for understanding this interaction, and I think that the combination of power assessment and computer architecture will be a fruitful area for the next five years, particularly as we real smaller line sizes and issues like metal migration become more important than they are today.

The performance of general purpose systems is usually not reflected in application suites such as the SPEC suite. As an example, consider shared libraries. The UNIX world that drives most of the benchmarks we use has only recently started using shared libraries, and few trace collection tools or traces incorporate information about shared libraries. However, in the Win32 world, everything is a shared library. Shared libraries present numerous ``opportunities for optimization'' because they're usually implemented as indirect branches. You can solve these problems using architecture-only solutions, but it's much better to use a combination of architecture, OS, linker and compiler modifications. For example, the real advantage of shared libraries is not that they save you space, but that they simplify system administration. Perhaps part of the performance workload can be pushed to the linker and O/S, leaving the architecture to implement things that only the architecture can do well (e.g., fine-grain access control or improved TLB handling).

I think this market segment is going to see increasing innovation from a fusion of O/S, linker and compiler modifications. The big short-term drivers will be painless profiles and runtime feedback, link-time modification of programs and a bevy of lawyers to solve the disputes about whose bug it really is.

High Performance Systems

In addition to the improvements from the entire computing food chain (architecture, linker, OS, compiler), high performance systems can benefit in two ways. First, multi-use systems will benefit most from bandwidth-enhancing, latency-tolerant designs. I think the most successful technique will be hardware assistance for threading - but threading must be cheap enough to let programmers use it. Even simple issues like the cost of TLB misses can affect the performance of threaded applications. The use of threads is expanding in the Win32 and UNIX world, but it's an open question if those threads really provide any parallelism or if people just use them to simplify program design. The issues posed by thread applications and shared libraries will only become more pressing as wide-spread adoption of CORBA or OLE become standard. A question for the next 3-5 years is how much and what kind of support is needed for user-level multithreading, and can compiler-discovered threading really buy us anything not provided by wide-issue ILP.

For single-thread performance, there are a number of excellent designs (SMT, multiscalar, multipath, multipipeline, etc). It's not clear if any one specific model will dominate. One idea, similar to the Cydrome, that I have been thinking about, but have no idea how to fab or design such a is a system with application-specific "instructions" that are actually encoded basic blocks or traces. Rather than simple microcode, each encoded block is a hardware-connected dataflow system that bypasses the use of registers within the block. Part of this technology is beginning to be available as a dynamic FPGA implementation, but I seriously doubt it represents a practical or commercial technology within the next 10-20 years.