x

Online Quizzes

The purpose of the online quizzes is to ensure that you have read and understood the papers in advance of class.
The quiz questions are not intended to be difficult or tricky; the answers to the questions should be known or easily found by anyone who has read the paper. However, the questions are designed so that you cannot easily find the answers within five minutes if you have not read the papers in advance. Hence read the papers before attempting the quizzes.
PDFs of the readings are/will be available in Canvas, and the readings list here will be updated as the course progresses.
Unit 1: Parallel Computing Models
L1: Introduction
L2: Message Passing & Shared Memory
M. D. Hill, S. Adve, L. Ceze, M. J. Irwin, D. Kaeli, M. Martonosi, J. Torrellas, T. F. Wenisch, D. Wood, K. Yelick - 21st Century Computer Architecture, CCC Whitepaper, 2012
David Wood and Mark Hill, Cost-Effective Parallel Computing, IEEE Computer, 1995
L3: Data-level Parallelism
Christina Delimitrou and Christos Kozyrakis. Amdahl's law for tail latency. Commun. ACM 61, July 2018
H Kim, R Vuduc, S Baghsorkhi, J Choi, Wen-mei Hwu, Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU), Ch. 1
L4: GPUs
Tor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers, General-Purpose Graphics Processor Architectures, Ch. 3.1-3.3, 4.1-4.3
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, Improving GPU performance via large warps and two-level warp scheduling, MICRO 2011.
Unit 2: Synchronization
L5,L6: Synchronization
Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 1, 4.0-4.3.3, 5.0-5.2.5
Alain Kagi, Doug Burger, and Jim Goodman. Efficient Synchronization: Let Them Eat QOLB, Proc. 24th International Symposium on Computer Architecture (ISCA 24), June, 1997
L7: Transactional Memory
Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 9.0-9.2.3
Ravi Rajwar and James R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. In Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, Dec. 2001.
L8: Lock-free Synchronization
Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 8-8.3)
M. Herlihy, Wait-Free Synchronization, ACM Trans. Program. Lang. Syst. 13(1): 124-149 (1991)
Unit 3: Coherence and Consistency
L9: Snooping Cache Coherence
Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence (Ch. 6 & 7)
L10: Snoop-based Multiprocessors
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. ISCA 2009
L11: Directory-based Coherence
Chaiken et al., Directory-Based Cache Coherence Protocols for Large-Scale Multiprocessors, IEEE Computer, 19-58, June 1990.
Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence , Chapter 8
L12: Coherence Optimization & COMA
A. Gupta et al. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". ICPP 1990.
Fredrik Dahlgren and Josep Torrellas. Cache-only memory architectures. Computer 6 (1999): 72-79.
L15: Memory Consistency II
Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence, Ch. 3-4
L16: Release Consistency and Programming Language MCMs
K. Gharachorloo, D. Lenoski, J. Laudon, P. B. Gibbons, A. Gupta, and J. L. Hennessy, Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors, ISCA 1990
L17: Speculative Consistency, HSA, and Spandex
H.J. Boehm, S. Adve, Foundations of the C++ Concurrency Model, PLDI 2008
K. Gharachorloo et al. "Two Techniques to Enhance the performance of Memory Consistency Models". ICPP 1991.
Unit 4: Interconnection Networks
L18: Interconnects: Intro
D. Lustig, M. Pellauer, M. Martonosi, PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models, MICRO 2014
C. Trippel, Y. A. Manerkar, D. Lustig, M. Pellauer, M. Martonosi, TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA, ASPLOS 2017
L19: Interconnects: Topology
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 3
Kim, Dally, & Abts. Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks. ISCA 2007.
L20: Interconnects: Routing
Scott & Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Hot Interconnects 1996.
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 4
L21: Interconnects: Flow Control
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 5
L22: Interconnects: Router uArch
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 6
Kim, Dally, Towles, & Gupta. Microarchitecture of a High-Radix Router. ISCA 2005.