Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic Deadlines & Materials
Week 1: August 23 – 29

Introduction to Big Data

Research Papers, Go

Week 2: August 30 – September 5

Scalable Server Design

Designing a DFS

Week 3: September 6 – 12

Network Design & Communication

Fault Tolerance and Consistency

Week 4: September 13 – 19

Fault Tolerance (continued)

Hadoop Distributed File System

Week 5: September 20 – 26

Bloom Filters

Week 6: September 27 – October 3

Distributed Hash Tables

Week 7: October 4 – 10

Data Models

Week 8: October 11 – 17

Distributed Computation

Week 9: October 18 – 24

Tues: No class, fall break!

Hadoop MapReduce

Week 10: October 25 – 31

Hadoop MR Components

Week 11: November 1 – 7

Cluster Orchestration

Week 12: November 8 – 14

Big Data Sampling Techniques

Spark Setup

Week 13: November 15 – 21

Spark Overview

Data Sketches

Week 14: November 22 – 28

Spark Streaming

Thurs: No class, Thanksgiving break!

Week 15: November 29 – December 5

Cardinality Estimation

Week 16: December 6 – 12

Wrapping up the Semester

Week 17: December 13 – 19

Final Quiz: Thursday, December 15 ⋅ 10:00 am – 12:00 pm