EECS 573

EECS 573: Microarchitecture (Winter 2012)

Instructor: Todd Austin, EECS 2221, austin@umich.edu

Office Hours: Monday 9-10:30, Wednesday 3:30-5:00, or by appt.

Class Web Page: http://www.eecs.umich.edu/~taustin/eecs573.html (Visit often!)

Course Synopsis: Graduate level introduction to the foundations of efficient microprocessor designs. Problems involving instruction supply, data supply, and instruction processing. Compile-time vs. run-time tradeoffs. Aggressive branch prediction. Wide-issue processors, in-order vs. out-of-order execution, instruction retirement. Case studies are taken from current microprocessors.

Text: None, we will be reading papers available from the Web, they are listed below.

Course Schedule (tentative):

Week	Topic	Readings	Events
1	Introduction
2	Fetch Optimization I	Papers 1 & 2 (+ 1)	Receive project details
3	Fetch Optimization II, Scheduling I	Paper 3 & 4 (+ 2, 3)
4	Scheduling II	Papers 5 & 6 (+ 5)
5	Scheduling III	Papers 7 & 8 (+ 6)	Project proposals due
6	Circuit-Sensitive Design I	Papers 9 & 10
7	Circuit-Sensitive Design II	Papers 11 & 12
8	Spring Break		No class
9	Power-Sensitive Design I	Papers 13 & 14 (+ 7)
10	Power-Sensitive Design II	Papers 15 & 16 (+ 9, 10)
11	Power-Sensitive Design III	Papers 17 & 18 (+11)
12	Tools and Techniques	Papers 19, 20, 21 & 22
13	Exam Review, Exam		Exam April 3 (tentative)
14	Application Specific Architectures	Papers 23, 24 & 25
15	Project Presentations		Project reports due, presentations

Project: There will be one project beginning in week 2. Students may work individually or in pairs - of course, pairs will be expected to produce more results. Students will conduct a research project that extends a microprocessor simulation system I make available (SimpleScalar). Other projects are also possible with prior approval. Students will produce a research report and present their findings in the final week of class. More details will follow...

The SimpleScalar sources and class benchmarks are available here:

http://www.eecs.umich.edu/~taustin/eecs573_public/simplesim-3.0b.tar.gz
http://www.eecs.umich.edu/~taustin/eecs573_public/instruct-progs.tar.gz

Grading:

Class Participation: 15%
Class Presentation: 15%
Exam: 30%
Project: 40%

Reading List:

We will be reading the following papers. We will discuss them in the week specified in the table above, please have read the papers by the beginning of class.

“The Cascaded Predictor: Economic and Adaptive Branch Target Prediction”, Karel Driesen, Urs Hölzle, In MICRO-31.
“A Block-based Trace Cache”, B. Black, B. Rychlik, J.P. Shen, in ISCA-26.
“Selective Eager Execution on the PolyPath Architecture”, Artur Klauser, Abhijit Paithankar, and Dirk Grunwald, in ISCA-98.
“Select-Free Instruction Scheduling Logic”, Mary D. Brown, Jared Stark, Yale N. Patt, In MICRO34.
“A high-speed dynamic instruction scheduling scheme for superscalars”, Masashiro Goshima, Kengo Nishino, Yasuhiko Nakashima, Shin-ichiro Mori, Toshiaki Kitamura, and Shinji Tomita. In MICRO34.
“Non-Stalling Counterflow Architecture”, Michael F. Miller, Kenneth J. Janik, and Shih-Lien Lu, In HPCA-4.
“Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous MultiThreading Processor”, Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm, in ISCA-96.
“Memory Dependence Speculation Trade-offs in Centralized, Continuous-Window Superscalar Processors”, Andreas Moshovos, Gurindar S. Sohi, Northwestern University; Computer Sciences, University of Wisconsin-Madison, in HPCA-6.
Slides on this subject are available here.
“Complexity-Effective Superscalar Processors”, Subbarao Palacharla, UW-Madison, Norman P. Jouppi, J. E. Smith, in ISCA-97.
“Improving Cache Performance with Balanced Tag and Data Paths”, Jih-Kwon Peir: University of Florida, Windsor W. Hsu: University of California at Berkeley, Honesty Young and Shauchi Ong: in ASPLOS-VII.

Additional papers covered during this class:
”A Scalable Front-End Architecture for Fast Instruction Delivery”, Reinman, Austin, Calder.
“Selective Cache Ways: On-Demand Cache Resource Allocation”, Albonesi et al.
”The Difference-Bit Cache”, Juan, Lang, Navarro.
“Performance Improvement with Circuit-Level Speculation”, T. Liu, S. Lu (Intel), in MICRO-33.
“Dynamic IPC/Clock Rate Optimization”, David H. Albonesi, in ISCA-98.
“Power: A First Class Design Constraint”, Trevor Mudge, IEEE Computer, 2001.
“Wattch: A Framework for Architectural-Level Power Analysis and Optimizations”, David Brooks, Vivek Tiwari, Margaret Martonosi, in ISCA2000.
“Very Low Power Pipelines using Significance Compression”, R. Canal, A. Gonzalez, J. E. Smith (University of Wisconsin - Madison), in MICRO-33.
“Energy Efficient Instruction Dispatch Buffer Design for Superscalar Processors”, Gurhan Kucuk, Kanad Ghose, Dmitry Ponomarev and Peter Kogge, ISLPED-01.
“Dynamic Zero Compression for Cache Energy Reduction”, L. Villa, M. Zhang, K. Asanovic, In MICRO-33.
“Automatic Performance-Setting for Dynamic Voltage Scaling”. Krisztián Flautner, Steve Reinhardt, and Trevor Mudge, In MOBICOM-7, 2001.
“Reducing state loss for effective trace sampling of superscalar processors”, T. M. Conte, M. A. Hirsch, and K. N. Menezes, In ICCD-1996.
“Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications”, Tim Sherwood, Erez Perelman and Brad Calder, In PACT-2001.
“Fast Out-Of-Order Processor Simulation Using Memoization”, E. Schnarr and J. Larus, I ASPLOS-VIII.
“Liberty Tutorial”, David August et. al., At MICRO34.
“Automatic architecture synthesis of VLIW and EPIC processors”, V. K. Shail Aditya, B. Ramakrishna Rau, In ISSS-1999.
“Lx: A Technology Platform for Customizable VLIW Embedded Processing”, Paolo Faraboschi, Geoffrey Brown, Joseph A. Fisher, Giuseppe Desoli, Fred (Mark Owen) Homewood, In ISCA-2000.
“XTENSA: A Configurable and Extensible Processor”, Ricardo E. Gonzalez, IEEE MICRO, 2000.

Presentation Schedule:

Paper Number	Presenter
6	Allen Cheng
7	Ken McInnis
9	Seokwoo Lee
10	Avinash
11	Jason Clemons
12	Joey Oravec
15	Curt Gomulinski
16	Matt Stamplis
17	Dan Burke
18	Jeremy Burns
21	David Oehmke
23	Mike Chu
24	Joel VanLaven

Supporting Readings:

The following papers will be discussed in class, but not presented in detail.

1. "Critical Issues Regarding the Trace Cache Fetch Mechanism", Sanjay Jeram Patel, Daniel Holmes Friendly, Yale N. Patt, University of Michigan CSE-TR-335-97.

2. “On Pipelining Dynamic Instruction Scheduling Logic”, J. Stark (Intel), M. D. Brown, Y. N. Patt (University of Texas at Austin), in MICRO-33.

3. “Out-Of-Order Execution May Not Be Cost-Effective on Processors Featuring Simultaneous Multithreading”, Sebastien Hily (Intel Microcomputer Research), Andre Seznec (Universite de Rennes) , in HPCA-5.

4. “A Study of Slipstream Processors”, Z. Purser, K. Sundaramoorthy, E. Rotenberg, in MICRO-33.

5. “Slipstream Processors: Improving both Performance and Fault Tolerance”, K. Sundaramoorthy, Z. Purser, and E. Rotenberg, In ASPLOS-2000.

6. “Transient Fault Detection via Simultaneous Multithreading”, S. K. Reinhardt and S. S. Mukherjee, In ISCA-2000.

7. “A Comparison of Two Architectural Power Models”, Soraya Ghiasi and Dirk Grunwald, Power Aware Computer Systems Workshop, November 2000.

8. “Pipeline Gating: Speculation Control For Energy Reduction”, Srilatha Manne, Artur Klauser and Dirk Grunwald, in ISCA-98.

9. “Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance”, David Brooks, Margaret Martonosi, Princeton University, in HPCA-5.

10. “Memory Hierarchy Reconfiguration For Energy And Performance In General-Purpose Processor Architectures”, R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, S. Dwarkadas, in MICRO-33.

11. “A Static Power Model for Architects”, J. A. Butts, G. Sohi, In MICRO-33.

12. "Modeling Superscalar Processors via Statistical Simulation", Sebastien Nussbaum and James Smith, PACT-01.

13. “High-level synthesis of nonprogrammable hardware accelerators”, R.Schreiber, S. Aditya (Gupta), B.R. Rau, V. Kathail, S. Mahlke, S. Abraham and G. Snider, In ASAP-2000.

14. “Reducing Code Size with Run-Time Code Decompression”, Charles Lefurgy, Eva Piccininni, Trevor Mudge, University of Michigan, in HPCA-6.

15. “The Use of Multithreading for Exception Handling”, Craig B Zilles, Gurindar S Sohi (University of Wisconsin, Madison), Joel S Emer (Compaq Computer Corporation), in MICRO32.

16. Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors”, D. Friendly, S. Patel, Y. Patt (Univ. of Michigan), in MICRO31.

17. “Selective Cache Ways: On-Demand Cache Resource Allocation”, David H Albonesi (University of Rochester), in MICRO32.

18. “Dynamic Cluster Assignment Mechanisms”, Ramon Canal, Joan Manuel Parcerisa, Antonio Gonzalez, Universitat Politecnica de Catalunya – Barcelona, in HPCA-6.

19. “Memory Forwarding: Enabling Aggressive Layout Optimizations by Guaranteeing the Safety of Data Relocation”, C. Luk (University of Toronto), T. Mowry (CMU) , in ISCA-26.

20. “Understanding the Backward Slices of Performance Degrading Instructions”, Craig Zilles, Gurindar S. Sohi, in ISCA2000.

21. “Confidence Estimation for Speculation Control”, Dirk Grunwald, Artur Klauser, Srilatha Manne and Andrew Pleszkun, in ISCA-98.

22. “Effective Jump Pointer Prefetching for Linked Data Structures”, A. Roth, G. Sohi, in ISCA-26.

23. “Prefetching using Markov Predictors”, Doug Joseph and Dirk Grunwald, in ISCA-97.

24. “Modified LRU Policies for Improving Second-level Cache Behavior”, Wayne A. Wong, Jean-Loup A. Baer, University of Washington, in HPCA-6.

25. “Simultaneous Subordinate Microthreading (SSMT)”, R. Chappell, J. Stark, S. Kim, Y. Patt, in ISCA-26.

26. “Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures”, Vikas Agarwal, M.S. Hrishikesh, Stephen Keckler, Doug Burger, in ISCA-2000.