Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic	Deadlines & Materials
Week 1: January 25 – 31
Introduction to Big Data Lecture 1 Recording Big Data Case Studies and Datasets Lecture 2 Recording A Quick Tour of Go Lecture 3 Recording	Google Flu Trends Traps in Big Data Analysis Netflix Recommendations How Facebook Labels You Oxide LLM Usage Guide Go resources Go Official Documentation A Tour of Go Learn Go With Tests Go builtins Go Youtube Channel Go by Example Goland IDE SSHing to Orion
Week 2: February 1 – 7
Unit 1: Scaling and Storage Slightly More Advanced Go Lecture 4 Recording Scaling Out Lecture 5 Recording Research Papers and HDFS Presentation Lecture 6 Recording	Protocol Buffers Chat Example JVM Serialization Performance Paper 0: HDFS Research Papers Research Paper Library Research Paper Guidelines Keshav: How to Read a Paper
Week 3: February 8 – 14
HDFS, Warehouse-Scale Computing Lecture 7 Recording Network Design Lecture 8 Recording 1 Recording 2	Lab 1 Due ⋅ February 9 Spurious Correlations Warehouse-scale Computing The Datacenter as a Computer Back-of-Envelope Calculations Elements of Scale The Friendship that made Google Underwater Datacenter Submarine Cable Map PingFS PiFS Scalable Server Design The C10K Problem (1999) High-Performance Server Design
Week 4: February 15 – 21
Fault Tolerance and Consensus Lecture 9 Recording	Quiz 1 ⋅ February 18 Lab 2 Due ⋅ February 16 Lab 3 Due ⋅ February 16 Reed-Solomon Error Correction Raft Animation Jepsen
Week 5: February 22 – 28
Finishing Consensus Recording Dynamo + DHT Intro Recording Distributed Hash Tables Lecture 10 Recording	Lab 4 Due ⋅ February 25 Paper 1: Dynamo ⋅ February 25 Port Assignments
Week 6: March 1 – 7
Finishing DHTs Recording Aurora, Project 1 Design Recording Data Models Lecture 11 Recording	Paper 2: Aurora ⋅ March 4 Databases & Data Models
Week 7: March 8 – 14
Finishing Data Models Recording	Quiz 2 ⋅ March 11 Lab 5 Due ⋅ March 13
Week 8: March 15 – 21
Spring Break!
Week 9: March 22 – 28
Unit 2: Distributed & Parallel Computation Introduction to Distributed Computation Lecture 12 Recording MapReduce Recording Hadoop Setup Recording	Paper 3: MapReduce ⋅ March 25 Brief MapReduce Overview 1 & 2 Hadoop Tips Hadoop Setup Guide
Week 10: March 29 – April 4
Hadoop MapReduce Lecture 13 Recording Dremel Recording MapReduce Job Design	Project 1 Due ⋅ March 30 Paper 4: Dremel ⋅ April 1 MapReduce Word Count Reddit Politics Job
Week 11: April 5 – 11
Monday: P1 Grading Spark Presentation Recording Bloom Filters Lecture 14 Recording (first half in recording above)	Lab 6 Due ⋅ April 10 Paper 6: Spark ⋅ April 8 Bloom Filters Demo Calculator Uses at Akamai (paper) Kirsch-Mitzenmacher Optimization On the False Positive Rate (2020) Cuckoo Filter Xor Filter
Week 12: April 12 – 18
Unit 3: Streaming Algorithms and Applications Cluster Orchestration Lecture 15 Recording Big Data Sampling Techniques Lecture 16 Recording 1 Recording 2	Quiz 3 ⋅ April 15 Lab 7 Due ⋅ April 17 Lab 8 Due ⋅ April 13 Paper 7: Dataflow ⋅ April 13 Scaling Big Data Mining Infrastructure Borg Mesos Sampling Reservoir Sampling Paper Reservoir Sampling @ Wikipedia SampleJob.java Politics Job (includes Reservoir Sampling Example)
Week 13: April 19 – 25
Automatic Facticles, Frequent Items Sketch Lecture 17 Recording Storm, Count-Min Sketch Lecture 18 Recording Cardinality Estimation Lecture 19 Recording	Project 2 Due ⋅ April 20 Paper 5: Automatic Factiles ⋅ April 20 Paper 8: Storm ⋅ April 22 Data Sketches Count-Min Sketch Count-Min With Demo Apache DataSketches Running Variance/Std. Dev Count-Min Sketch Count-Min With Demo RunningStatisticsND.java HyperLogLog Demo HyperLogLog Video
Week 14: April 26 – May 2
Spatiotemporal Data Lecture 20 Recording Flink Recording Working with Spark Lecture 21 Recording	Paper 9: Flink ⋅ April 29 A Tale of Three Spark APIs (Video) Spark Setup Guide
Week 15: May 3 – 9
Tensorflow Recording Spark Streaming Recording Friday: Project Grading	Quiz 4 ⋅ May 6 Lab 9 Due ⋅ May 8 Lab 10 Due ⋅ May 8 Paper 10: Tensorflow ⋅ May 4 Spark Resources Reddit Comment Analysis Tweet Analysis Spark Streaming Spark Streaming Programming Guide Spark DataFrames/Datasets Spark ML Example
Week 16: May 10 – 16
Spark & ML Recording
Week 17: May 17 – 23
Final Quiz: Monday, May 18 ⋅ 10:00am – 10:30 am

Topic

Deadlines & Materials

Week 1: January 25 – 31

Introduction to Big Data

Big Data Case Studies and Datasets

A Quick Tour of Go