Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic Deadlines & Materials
Week 1: January 22 – 28

Introduction to Big Data

A Whirlwind Tour of Go

  • Lab 0 Due ⋅ January 26

Week 2: January 29 – February 4

Unit 1: Scaling and Storage

Scaling Out

Research Papers and HDFS Presentation

  • Lab 1 Due ⋅ February 1

Week 3: February 5 – 11

Finishing HDFS Talk

Network Design, Work on Lab 2

  • Lab 2 Due ⋅ February 10

Week 4: February 12 – 18

Fault Tolerance and Consensus

Dynamo Paper Presentation

Project 1 Design Activity

  • Quiz 1 ⋅ February 16

  • Lab 4 Due ⋅ February 14

Week 5: February 19 – 25

Distributed Hash Tables

Megastore Paper Presentation

  • Lab 3 Due ⋅ February 22

  • Lab 5 Due ⋅ February 23

Week 6: February 26 – March 4

Data Models

Thurs: Class Cancelled

Week 7: March 5 – 11

Unit 2: Distributed & Parallel Computation

Introduction to Distributed Computation

Hadoop Setup

  • Lab 6 Due ⋅ March 10

  • Lab 7 Due ⋅ March 10

Week 8: March 12 – 18
Spring Break!
Week 9: March 19 – 25

Hadoop MapReduce

MapReduce Paper Presentation

  • Quiz 2 ⋅ March 23

Week 10: March 26 – April 1

Designing our MR Job

Spark, RDDs

  • (student presentation + discussion)
  • Paper 5: Spark ⋅ March 30
  • Paper 6: RDDs ⋅ March 30
Week 11: April 2 – 8

Spark Setup

Zookeeper Presentation

Cluster Orchestration

  • Lab 8 Due ⋅ April 7

Week 12: April 9 – 15

Unit 3: Streaming Algorithms and Applications

Big Data Sampling Techniques

IPFS Presentation

Bloom Filters

  • Lab 9 Due ⋅ April 10

  • Project 2 Due ⋅ April 14

Week 13: April 16 – 22

Data Sketches

  • Quiz 3 ⋅ April 20

Week 14: April 23 – 29

Spatiotemporal Data

Working with Spark

Week 15: April 30 – May 6

Spark Streaming

SageDB, Machine Learning

Week 16: May 7 – 13

Wrapping up the Semester

  • Quiz 4 ⋅ May 11

Week 17: May 14 – 20

Final Quiz: Tuesday, May 16 ⋅ 10:00am – 12:00 pm