Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic	Deadlines & Materials
Week 1: January 22 – 28
Introduction to Big Data Lecture 1 Recording (See Canvas for password) A Whirlwind Tour of Go Lecture 2 Recording	Lab 0 Due ⋅ January 26 Google Flu Trends Traps in Big Data Analysis Netflix Recommendations How Facebook Labels You Go resources Go Official Documentation A Tour of Go Learn Go With Tests Go builtins Go Youtube Channel Go by Example Goland IDE SSHing to Orion
Week 2: January 29 – February 4
Unit 1: Scaling and Storage Scaling Out Lecture 3 Recording Research Papers and HDFS Presentation Lecture 4 Recording	Lab 1 Due ⋅ February 1 Protocol Buffers Chat Example Paper 1: HDFS Research Papers Research Paper Library Research Paper Guidelines Keshav: How to Read a Paper
Week 3: February 5 – 11
Finishing HDFS Talk Network Design, Work on Lab 2 Lecture 5 Recording Recording 2 (tiny)	Lab 2 Due ⋅ February 10 Spurious Correlations Warehouse-scale Computing The Datacenter as a Computer Back-of-Envelope Calculations Elements of Scale The Friendship that made Google Underwater Datacenter Submarine Cable Map PingFS PiFS Scalable Server Design The C10K Problem (1999) High-Performance Server Design Raft Animation Jepsen.io
Week 4: February 12 – 18
Fault Tolerance and Consensus Lecture 6 Recording Dynamo Paper Presentation Project 1 Design Activity	Quiz 1 ⋅ February 16 Lab 4 Due ⋅ February 14 Paper 2: Dynamo ⋅ February 16 Port Assignments
Week 5: February 19 – 25
Distributed Hash Tables Lecture 7 Recording Megastore Paper Presentation Recording 1 Recording 2	Lab 3 Due ⋅ February 22 Lab 5 Due ⋅ February 23 Paper 3: Megastore ⋅ February 23 Consistent Hashing Chord Paper
Week 6: February 26 – March 4
Data Models Lecture 8 Recording Thurs: Class Cancelled	Databases & Data Models
Week 7: March 5 – 11
Unit 2: Distributed & Parallel Computation Introduction to Distributed Computation Lecture 9 Recording Hadoop Setup Recording	Lab 6 Due ⋅ March 10 Lab 7 Due ⋅ March 10 Hadoop Tips Hadoop Setup Guide MapReduce Word Count
Week 8: March 12 – 18
Spring Break!
Week 9: March 19 – 25
Hadoop MapReduce Lecture 10 Recording MapReduce Paper Presentation	Quiz 2 ⋅ March 23 Paper 4: MapReduce ⋅ March 23 Reddit Politics Job
Week 10: March 26 – April 1
Designing our MR Job Recording Spark, RDDs (student presentation + discussion)	Paper 5: Spark ⋅ March 30 Paper 6: RDDs ⋅ March 30
Week 11: April 2 – 8
Spark Setup Zookeeper Presentation Cluster Orchestration Lecture 11 Recording	Lab 8 Due ⋅ April 7 Spark Setup Guide A Tale of Three Spark APIs (Video) Paper 7: Zookeeper ⋅ April 6 Scaling Big Data Mining Infrastructure Borg Mesos
Week 12: April 9 – 15
Unit 3: Streaming Algorithms and Applications Big Data Sampling Techniques Lecture 12 Recording IPFS Presentation Bloom Filters Lecture 13 Recording	Lab 9 Due ⋅ April 10 Project 2 Due ⋅ April 14 Paper 8: IPFS ⋅ April 13 Sampling Reservoir Sampling Paper Reservoir Sampling @ Wikipedia SampleJob.java Politics Job (includes Reservoir Sampling Example) Bloom Filters Demo Calculator Uses at Akamai (paper) Kirsch-Mitzenmacher Optimization On the False Positive Rate (2020) Cuckoo Filter Xor Filter
Week 13: April 16 – 22
Data Sketches Lecture 14 Lecture 15 Recording	Quiz 3 ⋅ April 20 Data Sketches Count-Min Sketch Count-Min With Demo Apache DataSketches Running Variance/Std. Dev Count-Min Sketch Count-Min With Demo RunningStatisticsND.java HyperLogLog Demo HyperLogLog Video
Week 14: April 23 – 29
Spatiotemporal Data Lecture 16 Recording Working with Spark Lecture 17 Recording	Paper 9: Storm@Twitter ⋅ April 27
Week 15: April 30 – May 6
Spark Streaming Lecture 18 Recording SageDB, Machine Learning Lecture 19 Recording	Paper 10: Tensorflow ⋅ May 4 Spark Resources Reddit Comment Analysis Tweet Analysis Spark Streaming Spark Streaming Programming Guide Spark DataFrames/Datasets Spark ML Example
Week 16: May 7 – 13
Wrapping up the Semester	Quiz 4 ⋅ May 11
Week 17: May 14 – 20
Final Quiz: Tuesday, May 16 ⋅ 10:00am – 12:00 pm

Topic

Deadlines & Materials

Week 1: January 22 – 28

Introduction to Big Data

Lecture 1
Recording
- (See Canvas for password)

A Whirlwind Tour of Go

Lab 0 Due ⋅ January 26

Week 2: January 29 – February 4

Unit 1: Scaling and Storage

Scaling Out

Research Papers and HDFS Presentation

Lab 1 Due ⋅ February 1

Week 3: February 5 – 11

Finishing HDFS Talk

Network Design, Work on Lab 2

Lab 2 Due ⋅ February 10

Spurious Correlations
Warehouse-scale Computing
Submarine Cable Map
PingFS
PiFS
Scalable Server Design
- The C10K Problem (1999)
- High-Performance Server Design
Raft Animation
Jepsen.io

Week 4: February 12 – 18

Fault Tolerance and Consensus

Dynamo Paper Presentation

Project 1 Design Activity

Quiz 1 ⋅ February 16
Lab 4 Due ⋅ February 14

Paper 2: Dynamo ⋅ February 16
Port Assignments

Week 5: February 19 – 25

Distributed Hash Tables

Megastore Paper Presentation

Lab 3 Due ⋅ February 22
Lab 5 Due ⋅ February 23

Paper 3: Megastore ⋅ February 23
Consistent Hashing
Chord Paper

Week 6: February 26 – March 4

Data Models

Thurs: Class Cancelled

Databases & Data Models

Week 7: March 5 – 11

Unit 2: Distributed & Parallel Computation

Introduction to Distributed Computation

Hadoop Setup

Recording

Lab 6 Due ⋅ March 10
Lab 7 Due ⋅ March 10

Week 8: March 12 – 18

Spring Break!

Week 9: March 19 – 25

Hadoop MapReduce

MapReduce Paper Presentation

Quiz 2 ⋅ March 23

Paper 4: MapReduce ⋅ March 23
Reddit Politics Job

Week 10: March 26 – April 1

Designing our MR Job

Recording

Spark, RDDs

(student presentation + discussion)

Paper 5: Spark ⋅ March 30
Paper 6: RDDs ⋅ March 30

Week 11: April 2 – 8

Spark Setup

Zookeeper Presentation

Cluster Orchestration

Lab 8 Due ⋅ April 7

Week 12: April 9 – 15

Unit 3: Streaming Algorithms and Applications

Big Data Sampling Techniques

IPFS Presentation

Bloom Filters

Lab 9 Due ⋅ April 10
Project 2 Due ⋅ April 14

Paper 8: IPFS ⋅ April 13
Sampling
- Reservoir Sampling Paper
- Reservoir Sampling @ Wikipedia
- SampleJob.java
- Politics Job (includes Reservoir Sampling Example)
Bloom Filters
- Demo
- Calculator
- Uses at Akamai (paper)
- Kirsch-Mitzenmacher Optimization
- On the False Positive Rate (2020)
- Cuckoo Filter
- Xor Filter

Week 13: April 16 – 22

Data Sketches

Quiz 3 ⋅ April 20

Data Sketches

Week 14: April 23 – 29

Spatiotemporal Data

Working with Spark

Paper 9: Storm@Twitter ⋅ April 27

Week 15: April 30 – May 6

Spark Streaming

SageDB, Machine Learning

Paper 10: Tensorflow ⋅ May 4
Spark Resources

Week 16: May 7 – 13

Wrapping up the Semester

Quiz 4 ⋅ May 11

Week 17: May 14 – 20

Final Quiz: Tuesday, May 16 ⋅ 10:00am – 12:00 pm