Homework, Project, and Verilog handouts can be found under homework/projects.
Lecture recordings
- Lecture 1: Class intro, basics, and pipelining review (pptx) (pdf)
- Lecture 2: A bit on performance and ISAs. Mostly on pipelining and a review of hazards and related stuff. (pptx) (pdf)
- Lecture 3: Control hazards, power, ILP and Dynamic execution. (pptx) (pdf)
- Lecture 4: Tomasulo's algorithm (pptx) (pdf)
- Lecture 5: Tomasulo's algorithm continued; Branch prediction (pptx) (pdf)
- Lecture 6: Branch prediction, more on Tomasulo's (pptx) (pdf)
- Lecture 7: A bit more on address prediction; misprediction recovery; ILP (pptx) (pdf)
- Lecture 8: P6, Project, start of R10K (pptx) (pdf)
- Lecture 9: R10K scheme, Tclock (pptx) (pdf)
- Lecture 10: Tclk, Finish up out-of-order. (pptx) (pdf)
- Lectures 11, 12, and 13: Caches (pptx) (pdf)
- Lecture 14: Memory speculation (pptx) (pdf)
- Lecture 15: Power! (pptx) (pdf)
- Lecture 16: Start on multi-processor (pptx) (pdf)
- Lecture 17: Multi-processor (pptx) (pdf)
- Lecture 18 and 19: Static optimizations (pptx) (pdf)
- Lecture 20: SMT and similar things. (pptx) (pdf)
- Colwell talk link
- Lecture 21: ISA design (pptx) (pdf)
- Lecture 23: Instruction Scheduling (pptx) (pdf)
Note that many of these links go to the ACM or IEEE digital libraries, which require subscriptions to access. You will need to access them from a umich.edu IP address to take advantage of the University's subscription. You will be required to read the McFarling paper and probably one other.
- Combining Branch Predictors, S. McFarling, WRL Technical Note TN-36, June 1993.
Proposes the gshare branch predictor, covers a few others. See also the paper by Yeh and Patt (below).- Prophet/Critic predictor by Falcon et. al. ISCA 2004.
A nice, modern branch predictor.- Checkpoint processing and recovery: Towards scalable large instruction window processors. By H. Akkary, R. Rajwar, and S. T. Srinivasan. In MICRO 36, December 2003.
Reordering without the reorder buffer.- Implementation of precise interrupts in pipelined processors by J. E. Smith and A. R. Pleszkun. Proceedings of the 12th Annual International Symposium on Computer Architecture, June 1985, pp. 36-44.
The original paper on reorder buffers and their alternatives.- The Mips R10000 superscalar microprocessor by K. C. Yeager, IEEE Micro, April 1996.
One of the first out-of-order microprocessors. Uses a merged physical register file (unlike the P6).- The Alpha 21264 microprocessor by R. E. Kessler, IEEE Micro, Mar/Apr 1999.
Another out-of-order microprocessor that also uses a merged physical register file. The 21264 was easily the fastest processor available when it came out. The "dual cluster" design that uses two copies of the register file to reduce the complexity and latency of the bypass network is particularly interesting. This paper also has a substantial discussion of the 21264 tournament branch predictor that's also described in the textbook.- Power: A First-Class Architectural Design Constraint. IEEE Computer T. Mudge, 2001.
First major paper to argue that power is going to be a major constraint on computer performance. Worth reading.