EECS 470: Computer Architecture
Homework, Project, and Verilog handouts can be found under
homework/projects.
Lecture notes
- Lecture 1: Class intro, basics, and pipelining review
- Lecture 2: Pipelining, a review of
Hazards and some other stuff
- Lecture 3: Control hazards, power,
ILP and Dynamic execution.
- Lecture 4: Tomasulo's algorithm
- Lecture 5 & 6: Tomasulo's algorithm,
Branch prediction
- [GSI Lecture]: P3 Pipeline Intro
- Lecture 7: Review branch
prediction, add the RoB.
- Lecture 8: R10K, ILP, Project, start of P6
- Lecture 9: P6 scheme, Tclock
- Caches
Discussion notes
Papers
Note that many of these links go to the ACM or IEEE digital libraries, which require subscriptions to access. You will need to access them from a umich.edu IP address to take advantage of the University's subscription. You will be required to read the McFarling paper and probably one other.
-
Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching by
E. Rotenberg, S. Bennett, and J.E. Smith, Proceedings of the 29th Annual International Symposium on Microarchitecture, November 1996
First paper on trace caches.
- Combining Branch Predictors, S. McFarling, WRL Technical Note TN-36, June 1993.
Proposes the gshare branch predictor, covers a few others. See also the
paper by Yeh and Patt (below).
- Checkpoint processing and recovery:
Towards scalable large instruction window processors.
By H. Akkary, R. Rajwar, and S. T. Srinivasan.
In MICRO 36, December 2003.
Reordering without the reorder buffer.
- Implementation of precise interrupts in pipelined processors by J. E. Smith and A. R. Pleszkun. Proceedings of the 12th Annual International Symposium on Computer Architecture, June 1985, pp. 36-44.
The original paper on reorder buffers and their alternatives.
- The Mips R10000 superscalar microprocessor by K. C. Yeager, IEEE Micro, April 1996.
One of the first out-of-order microprocessors. Uses a merged physical register file (unlike the P6).
- The Alpha 21264 microprocessor by R. E. Kessler, IEEE Micro, Mar/Apr 1999.
Another out-of-order microprocessor that also uses a merged physical register file. The 21264 was easily the fastest processor available when it came out. The "dual cluster" design that uses two copies of the register file to reduce the complexity and latency of the bypass network is particularly interesting. This paper also has a substantial discussion of the 21264 tournament branch predictor that's also described in the textbook.
- Alternative Implementations of Two-Level Adaptive Branch Prediction by T.-Y. Yeh and Y. N. Patt. Proceedings of the 19th Annual International Symposium on Computer Architecture, June 1992, pp. 124-134.
The classic reference on two-level branch prediction.
- Understanding the detailed Architecture of AMD's 64 bit Core by Hans de Vries, www.chip-architect.com.
Some guy's attempt to reverse engineer the Opteron based on published documents, patents, and some speculation. The most detailed Opteron description I know of, but not 100% accurate.
- The Microarchitecture of the PentiumŪ 4 Processor by Glenn Hinton et al. Intel Technology Journal, Vol. 5 Issue 1 (February 2001).
Description of the Pentium 4 microarchitecture by the chief designers. ncludes some comparisons with P6 and some justification of the deep pipeline/high frequency deisgn goal.
On-line stuff
Files from discussion examples