Schedule

  • Date
    Topic
    Mandatory Reading
    Optional Reading
    Presenter
  • 08/29/2023
    Tuesday
    Introduction and Data Models
    NA
    What Goes Around Comes Around
    Lin Ma
  • 08/31/2023
    Thursday
    RDBMS Architecture
    Anatomy of a Database System (Section 1, 3, 4, 5)
    An Overview of Query Optimization in Relational Systems by Chaudhuri
    Lin Ma
  • 09/05/2023
    Tuesday
    Consistency and Isolation Levels
    Serializable Snapshot Isolation in PostgreSQL
    Generalized Isolation Level Definitions
    Harvin Mumick
  • 09/07/2023
    Thursday
    Cloud OLTP
    Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
    Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
    Yifan Zhu
  • 09/12/2023
    Tuesday
    Indexing
    Automatically Indexing Millions of Databases in Microsoft Azure SQL Database
    Automated Selection of Materialized Views and Indexes for SQL Databases
    Sumanth Umesh
  • 09/14/2023
    Thursday
    Configuration Tuning
    Automatic Database Management System Tuning Through Large-scale Machine Learning
    AI Meets Database AI4DB and DB4AI
    Tianji Cong
  • 09/19/2023
    Tuesday
    Columnar Databases
    C-Store: A Column-oriented DBMS
    Column-Stores vs. Row-Stores: How Different Are They Really?
    Vedant Iyer
  • 09/21/2023
    Thursday
    Vectorized Query Processing
    MonetDB/X100: Hyper-Pipelining Query Execution & Vectorwise: Beyond Column Stores
    Mason Nelson
  • 09/26/2023
    Tuesday
    Query Compilation
    Efficiently Compiling Efficient Query Plans for Modern Hardware
    Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask
    Yanjun Chen
  • 09/28/2023
    Thursday
    Column-oriented Storage Format
    Dremel: Interactive Analysis of Web-Scale Datasets
    Apache Parquet & The striping and assembly algorithms from the Dremel paper
    Rongzhi Zhang
  • 10/03/2023
    Tuesday
    Cloud Analytics
    Dremel: A Decade of Interactive SQL Analysis at Web Scale
    Angana Borah
  • 10/05/2023
    Thursday
    Cloud Analytics (Cont.)
    Building An Elastic Query Engine on Disaggregated Storage
    The Snowflake Elastic Data Warehouse
    Matt Martin
  • 10/10/2023
    Tuesday
    Big Data Processing
    Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing
    MapReduce: Simplified Data Processing on Large Clusters
    Zhenning Yang
  • 10/12/2023
    Thursday
    Lakehouse
    Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores
    Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
    Jie Liu
  • 10/17/2023
    Tuesday
    Fallbreak
  • 10/19/2023
    Thursday
    Distributed SQL Engine
    Presto: A Decade of SQL Analytics at Meta
    Zhuocheng Sun
  • 10/24/2023
    Tuesday
    Midterm
  • 10/26/2023
    Thursday
    Reusable Execution Engine
    Velox: Meta’s Unified Execution Engine
    Ben Miller
  • 10/31/2023
    Tuesday
    Embedded Analytics
    Data Management for Data Science Towards Embedded Analytics
    DuckDB: an Embeddable Analytical Database
    Samika Gupta
  • 11/02/2023
    Thursday
    No Class
    Work on Mid-semester Project Presentation
  • 11/07/2023
    Tuesday
    Guest Lecture: Innovations in Amazon Redshift
    Amazon Redshift Re-Invented
    Sudipto Das, Senior Principal Engineer, AWS
  • 11/09/2023
    Thursday
    Mid-semester Project Presentations
  • 11/14/2023
    Tuesday
    Wide-column Stores
    Bigtable: A Distributed Storage System for Structured Data
    Cassandra - A Decentralized Structured Storage System
    Zesheng Yu
  • 11/16/2023
    Thursday
    Key-value Stores
    Dynamo: Amazon's Highly Available Key-value Store
    Eventual Consistency Today Limitations, Extensions, and Beyond
    Houming Chen
  • 11/21/2023
    Tuesday
    Stream Processing
    Apache Flink™: Stream and Batch Processing in a Single Engine
    Discretized Streams: Fault-Tolerant Streaming Computation at Scale
    Xueshen Liu
  • 11/23/2023
    Thursday
    Thanksgiving
  • 11/28/2023
    Tuesday
    Stream Processing (Cont.)
    Differential dataflow
    Naiad: a timely dataflow system
    Zhixiang Teoh
  • 11/30/2023
    Thursday
    Vector Databases
    Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs & What is a Vector Database
    Billion-scale similarity search with GPUs
    Ruey-Tzer Hsu
  • 12/05/2023
    Tuesday
    Guest Lecture: Google BigQuery: from Google Cloud to a multi-cloud lakehouse
    No mandatory reading
    Justin Levandoski, Director of Engineering, Google