-
08/29/2023
Tuesday
Introduction and Data Models
What Goes Around Comes Around
Lin Ma
-
08/31/2023
Thursday
RDBMS Architecture
Anatomy of a Database System (Section 1, 3, 4, 5)
An Overview of Query Optimization in Relational Systems by Chaudhuri
Lin Ma
-
09/05/2023
Tuesday
Consistency and Isolation Levels
Serializable Snapshot Isolation in PostgreSQL
Generalized Isolation Level Definitions
Harvin Mumick
-
09/07/2023
Thursday
Cloud OLTP
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
Yifan Zhu
-
09/12/2023
Tuesday
Indexing
Automatically Indexing Millions of Databases in Microsoft Azure SQL Database
Automated Selection of Materialized Views and Indexes for SQL Databases
Sumanth Umesh
-
09/14/2023
Thursday
Configuration Tuning
Automatic Database Management System Tuning Through Large-scale Machine Learning
AI Meets Database AI4DB and DB4AI
Tianji Cong
-
09/19/2023
Tuesday
Columnar Databases
C-Store: A Column-oriented DBMS
Column-Stores vs. Row-Stores: How Different Are They Really?
Vedant Iyer
-
09/21/2023
Thursday
Vectorized Query Processing
MonetDB/X100: Hyper-Pipelining Query Execution & Vectorwise: Beyond Column Stores
Mason Nelson
-
09/26/2023
Tuesday
Query Compilation
Efficiently Compiling Efficient Query Plans for Modern Hardware
Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask
Yanjun Chen
-
09/28/2023
Thursday
Column-oriented Storage Format
Dremel: Interactive Analysis of Web-Scale Datasets
Apache Parquet & The striping and assembly algorithms from the Dremel paper
Rongzhi Zhang
-
10/03/2023
Tuesday
Cloud Analytics
Dremel: A Decade of Interactive SQL Analysis at Web Scale
Angana Borah
-
10/05/2023
Thursday
Cloud Analytics (Cont.)
Building An Elastic Query Engine on Disaggregated Storage
The Snowflake Elastic Data Warehouse
Matt Martin
-
10/10/2023
Tuesday
Big Data Processing
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing
MapReduce: Simplified Data Processing on Large Clusters
Zhenning Yang
-
10/12/2023
Thursday
Lakehouse
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
Jie Liu
-
10/17/2023
Tuesday
Fallbreak
-
10/19/2023
Thursday
Distributed SQL Engine
Presto: A Decade of SQL Analytics at Meta
Zhuocheng Sun
-
10/24/2023
Tuesday
Midterm
-
10/26/2023
Thursday
Reusable Execution Engine
Velox: Meta’s Unified Execution Engine
Ben Miller
-
10/31/2023
Tuesday
Embedded Analytics
Data Management for Data Science Towards Embedded Analytics
DuckDB: an Embeddable Analytical Database
Samika Gupta
-
11/02/2023
Thursday
No Class
Work on Mid-semester Project Presentation
-
11/07/2023
Tuesday
Guest Lecture: Innovations in Amazon Redshift
Amazon Redshift Re-Invented
Sudipto Das, Senior Principal Engineer, AWS
-
11/09/2023
Thursday
Mid-semester Project Presentations
-
11/14/2023
Tuesday
Wide-column Stores
Bigtable: A Distributed Storage System for Structured Data
Cassandra - A Decentralized Structured Storage System
Zesheng Yu
-
11/16/2023
Thursday
Key-value Stores
Dynamo: Amazon's Highly Available Key-value Store
Eventual Consistency Today Limitations, Extensions, and Beyond
Houming Chen
-
11/21/2023
Tuesday
Stream Processing
Apache Flink™: Stream and Batch Processing in a Single Engine
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
Xueshen Liu
-
11/23/2023
Thursday
Thanksgiving
-
11/28/2023
Tuesday
Stream Processing (Cont.)
Naiad: a timely dataflow system
Zhixiang Teoh
-
11/30/2023
Thursday
Vector Databases
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs & What is a Vector Database
Billion-scale similarity search with GPUs
Ruey-Tzer Hsu
-
12/05/2023
Tuesday
Guest Lecture: Google BigQuery: from Google Cloud to a multi-cloud lakehouse
Justin Levandoski, Director of Engineering, Google