Research
My research focus is on the interaction of architecture and
operating systems. I strive to come up with new hardware/software
interfaces that can solve existing problems and potentially open
up new opportunities for both architects and OS designers.
Additionally, as a computer architect, I am cognizant of the fact
that architectures are changed relatively infrequently and are
hampered by a great deal of legacy software. I endeavour to
produce designs that are both novel in thier approach and
practical in their utility. Below, I describe the current
progress of two research projects and future work that these
projects lend themselves to. The first project improves the
performance of networked hosts by throwing away traditional
hardware and software interfaces between network interface
controllers and CPUs and replacing them with more efficient ones
that exploit characteristics of network processing. The second
project introduces a novel approach to thinking about register
context, resulting in a new register renaming mechanism that
enables a nearly ideal implementation of register windows and the
possiblity for many threads to co-exist in one CPU core. Below, I
also describe a tool that my research group developed that enabled
the research described here to be carried.
The Simple Integrated Network Interface Controller
The wheel of reincarnation refers to a process in which a computer
architect chooses to design a peripheral device, adds some
intelligence to that peripheral device, decides that the
intelligence should be more general, and realizes that the end
result is a simple peripheral device attached to an additional
general-purpose processor. My research is an instance of going
around the wheel of reincarnation when designing a faster network
interface controller (NIC). Commodity NICs have some intelligence
that allows them to alleviate the central processor from some
simple work. The advent of TCP offload engines (TOEs) goes
further around the wheel by adding more intelligence to the
NIC---enough intelligence to run the entire TCP/IP protocol stack.
I argue that NIC design should come full circle on the wheel of
reincarnation by moving the special functionality of the NIC back
to the central processor.
I have demonstrated the importance
of tighter integration between the NIC and CPU. I have shown that
simple integration of a traditional NIC onto the CPU die can
result in bandwidth improvements of more than a factor of two
relative to more conventional designs. Tighter integration alone
provides significant benefits, but also enables a redesign of the
NIC itself to take advantage of the new properties of the
interactions between the NIC and CPU, particularly lower latency.
This leads to a NIC design that is significantly simpler than
current high performance NICs. This design, which I call the
simple integrated NIC or SINIC, moves more of the intelligence of
network processing to the CPU core allowing system software
programmers significantly more flexibility in the way that system
software uses the NIC. Thus, a suitably redesigned NIC enables
software optimizations not possible with traditional NIC
designs. V-SINIC, an extended version of SINIC, provides virtual
per-packet registers, enabling packet-level parallel processing
while maintaining a FIFO model. V-SINIC also enables deferring the
copy of the packet payload on receive, which I exploit to
implement a zero-copy receive optimization in the Linux 2.6
kernel.
The Virtual Context Architecture
Large numbers of logical registers can improve performance by
allowing fast access to multiple subroutine contexts (register
windows) and multiple thread contexts (multithreading). Support
for both of these together requires a multiplicative number of
registers that quickly becomes prohibitive. The virtual context
architecture (VCA) \cite{oehmke:vca}, a new register-file
architecture that virtualizes logical register contexts, overcomes
this limitation. VCA works by treating the physical registers as a
cache of a much larger memory-mapped logical register
space. Complete contexts, whether activation records or threads,
are no longer required to reside in their entirety in the physical
register file. A VCA implementation of register windows on a
single-threaded machine reduces data cache accesses by 20%,
providing the same performance as a conventional machine while
requiring one fewer cache port. Using VCA to support
multithreading enables a four-thread machine to use half as many
physical registers without a significant performance loss. VCA
naturally extends to support both multithreading and register
windows, providing higher performance with significantly fewer
registers than a conventional machine.
The M5 Simulator
The research described above has all been done using the M5 simulator. M5 is a modular
platform for computer architecture research, encompassing
system-level architecture as well as processor microarchitecture.
It is intended for use by researchers in academia or industry
looking for a free, open-source, full-system simulation
environment for processor, system, or platform architecture
studies.
Because the primary focus of the M5 development team has been
simulation of network-oriented server workloads, M5 incorporates
several features not commonly found in other simulators including:
- Full-system simulation
- Detailed timing of I/O device accesses and DMA operations
- Accurate, deterministic simulation of multiple networked systems
- Flexible, script-driven configuration to simplify specification of
complex multi-system configurations
- A variety of included network workloads, and
- Support for storing results from multiple simulations in a unified
database for automated reporting and graph generation.
M5 also integrates a number of other desirable features, including
pervasive object orientation, multiple interchangeable CPU models,
an event-driven memory system model, and multiprocessor
capability.
The M5 simulator is largely written in C++ and uses Python
extensively for configuration. The code is freely distributable
under a BSD-style license and does not depend on any commercial or
restricted-license software.