x

Online Quizzes

The purpose of the online quizzes is to ensure that you have read and understood the papers in advance of class.
The quiz questions are not intended to be difficult or tricky; the answers to the questions should be known or easily found by anyone who has read the paper. However, the questions are designed so that you cannot easily find the answers within five minutes if you have not read the papers in advance. Hence read the papers before attempting the quizzes.
PDFs of the readings are available in Canvas.
Unit 1: Parallel Computing Models
L1: Introduction
L2: Message Passing & Shared Memory
(1)   M. D. Hill, S. Adve, L. Ceze, M. J. Irwin, D. Kaeli, M. Martonosi, J. Torrellas, T. F. Wenisch, D. Wood, K. Yelick - 21st Century Computer Architecture, CCC Whitepaper, 2012
(2)   David Wood and Mark Hill, Cost-Effective Parallel Computing, IEEE Computer, 1995
L3: Data-level Parallelism
(3)   Christina Delimitrou and Christos Kozyrakis. Amdahl's law for tail latency. Commun. ACM 61, July 2018
(4)   H Kim, R Vuduc, S Baghsorkhi, J Choi, Wen-mei Hwu, Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU), Ch. 1
L4: GPUs
(5)   Tor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers, General-Purpose Graphics Processor Architectures, Ch. 3.1-3.3, 4.1-4.3
(6)   V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, Improving GPU performance via large warps and two-level warp scheduling, MICRO 2011.
Unit 2: Synchronization
L5: Synchronization
(1)   Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 1, 4.0-4.3.3, 5.0-5.2.5
(2)   Alain Kagi, Doug Burger, and Jim Goodman. Efficient Synchronization: Let Them Eat QOLB, Proc. 24th International Symposium on Computer Architecture (ISCA 24), June, 1997
L6: Lock-free Synchronization
(3)   Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 8-8.3)
(4)   M. Herlihy, Wait-Free Synchronization, ACM Trans. Program. Lang. Syst. 13(1): 124-149 (1991)
L7: Transactional Memory
(5)   Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 9.0-9.2.3
(6)   Ravi Rajwar and James R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. In Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, Dec. 2001.
Unit 3: Coherence and Consistency
L8: Snooping Cache Coherence
(1)   Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence (Ch. 6 & 7)
L9: Snoop-based Multiprocessors
(2)   Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. ISCA 2009
L10: Directory-based Coherence
(3)   Chaiken et al., Directory-Based Cache Coherence Protocols for Large-Scale Multiprocessors, IEEE Computer, 19-58, June 1990.
(4)   Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence , Chapter 8
L11: Coherence Optimization & COMA
(5)   A. Gupta et al. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". ICPP 1990.
(6)   Fredrik Dahlgren and Josep Torrellas. Cache-only memory architectures. Computer 6 (1999): 72-79.
L12 Memory Consistency
(7)   Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence, Ch. 3-4
L13 Relaxed Memory Consistency
(8)   A. Singh, Satish Narayanasamy, Daniel Marino, Todd Millstein, Madanlal Musuvathi. A Safety-First Approach to Memory Models. IEEE Micro, Top Picks from the 2012 Computer Architecture Conferences
(9)   K. Gharachorloo, D. Lenoski, J. Laudon, P. B. Gibbons, A. Gupta, and J. L. Hennessy, Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors, ISCA 1990
L14 Speculative Consistency
(10)   K. Gharachorloo et al. "Two Techniques to Enhance the performance of Memory Consistency Models". ICPP 1991.
(11)   C. Blundell, M. M. K. Martin, T.F. Wenisch, InvisiFence: Performance-transparent Memory Ordering in Conventional Multiprocessors, ISCA 2009
L15 Speculative Consistency
(12)   B. Boehm, S. Adve, Foundations of the C++ Concurrency Model, PLDI 2008
L17 DeNovo
(13)   B. Choi et al, DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism, PACT 2011
(14)   B. Hechtman, D. Sorin, Exploring memory consistency for massively-threaded throughput-oriented processors, ISCA 2013
Unit 4: Interconnection Networks
L16: Interconnects: Intro
(1)   D. Hower et al, Heterogeneous-race-free memory models, ASPLOS 2014
(2)   Mukherjee et al. The Alpha 21364 Network Architecture, Hot Interconnects 2001.
L17 Interconnects: Topology
(3)   On-Chip Networks, Synthesis Lecture, Jerger and Peh, Ch. 3
(4)   Kim, Dally, & Abts. Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks. ISCA 2007.
L18 Interconnects: Routing
(5)   Scott & Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Hot Interconnects 1996.
(6)   On-Chip Networks, Synthesis Lecture, Jerger and Peh, Ch. 4
L19 Interconnects: Flow Control
(7)   On-Chip Networks, Synthesis Lecture, Jerger and Peh, Ch. 5
L20 Interconnects: Router uArch
(8)   On-Chip Networks, Synthesis Lecture, Jerger and Peh, Ch. 6
(9)   Kim, Dally, Towles, & Gupta. Microarchitecture of a High-Radix Router. ISCA 2005.
Unit 5: Modern & Unconventional Multiprocessors
L22 Multithreading
(1)   D. Tullsen et al. "Simultaneous multithreading: Maximizing On-Chip Parallelism". ISCA 1995.
(2)   Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. 1995. Multiscalar processors. In Proceedings of the 22nd annual international symposium on Computer architecture (ISCA 95).
L5: Applications
(3)   P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso, Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors, ASPLOS 1998
(4)   M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. Popescu, A. Ailamaki, B. Falsafi, Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware, ASPLOS 2012