Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic Deadlines & Materials
Week 1: January 25 – 31

Introduction to Big Data

Big Data Case Studies and Datasets

A Quick Tour of Go

Week 2: February 1 – 7

Unit 1: Scaling and Storage

Slightly More Advanced Go

Scaling Out

Research Papers and HDFS Presentation

Week 3: February 8 – 14

HDFS, Warehouse-Scale Computing

Network Design

  • Lab 1 Due ⋅ February 9

Week 4: February 15 – 21

Fault Tolerance and Consensus

  • Quiz 1 ⋅ February 18

  • Lab 2 Due ⋅ February 16

  • Lab 3 Due ⋅ February 16

Week 5: February 22 – 28

Distributed Hash Tables

  • Lab 4 Due ⋅ February 25

  • Paper 1: Dynamo ⋅ February 25
Week 6: March 1 – 7

Data Models

Week 7: March 8 – 14

Unit 2: Distributed & Parallel Computation

Introduction to Distributed Computation

Hadoop Setup

  • Quiz 2 ⋅ March 11

Week 8: March 15 – 21
Spring Break!
Week 9: March 22 – 28

Hadoop MapReduce

Week 10: March 29 – April 4

Designing a MapReduce Job

Week 11: April 5 – 11

Spark Setup

Cluster Orchestration

Week 12: April 12 – 18

Unit 3: Streaming Algorithms and Applications

Big Data Sampling Techniques

Bloom Filters

  • Quiz 3 ⋅ April 15

Week 13: April 19 – 25

Data Sketches

Week 14: April 26 – May 2

Spatiotemporal Data

Working with Spark

Week 15: May 3 – 9

Spark Streaming

SageDB, Machine Learning

  • Quiz 4 ⋅ May 6

Week 16: May 10 – 16

Wrapping up the Semester

Week 17: May 17 – 23

Final Quiz: Monday, May 18 ⋅ 10:00am – 10:30 am