Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic Deadlines & Materials
Week 1: January 22 – 28

Introduction to Big Data

Big Data Case Studies and Datasets

A Quick Tour of Go

Week 2: January 29 – February 4

Unit 1: Scaling and Storage

Slightly More Advanced Go

Scaling Out

Research Papers and HDFS Presentation

Week 3: February 5 – 11

Network Design

Week 4: February 12 – 18

Fault Tolerance and Consensus

Week 5: February 19 – 25

Distributed Hash Tables

Week 6: February 26 – March 4

Data Models

Week 7: March 5 – 11

Unit 2: Distributed & Parallel Computation

Introduction to Distributed Computation

Hadoop Setup

Week 8: March 12 – 18
Spring Break!
Week 9: March 19 – 25

Hadoop MapReduce

Week 10: March 26 – April 1

Designing a MapReduce Job

Week 11: April 2 – 8

Spark Setup

Cluster Orchestration

Week 12: April 9 – 15

Unit 3: Streaming Algorithms and Applications

Big Data Sampling Techniques

Bloom Filters

Week 13: April 16 – 22

Data Sketches

Week 14: April 23 – 29

Spatiotemporal Data

Working with Spark

Week 15: April 30 – May 6

Spark Streaming

SageDB, Machine Learning

Week 16: May 7 – 13

Wrapping up the Semester

Week 17: May 14 – 20

Final Quiz: Monday, May 18 ⋅ 10:00am – 10:30 am