Course Schedule

The following is a tentative schedule for the course (subject to change).

Topic Deadlines & Materials
Week 1: Aug 23 - 27

Introduction to Big Data

Research Papers & Go

Week 2: Aug 30 - Sep 3

Scalable Server Design

Designing a DFS (and more Go)

Week 3: Sep 6 - 10

Network Design & Communication

Fault Tolerance and Consistency

Week 4: Sep 13 - 17

Fault Tolerance (continued)

Hadoop Distributed File System

  • Quiz 1 ⋅ Sep 14
  • Lab 2 Due ⋅ Sep 14
  • Paper 1: HDFS (Sep 16)
Week 5: Sep 20 - 24

Hadoop Setup

Bloom Filters

Week 6: Sep 27 - Oct 1

Dynamo Presentation

Building a Bloom Filter

  • (See end of Dynamo Video)

Distributed Hash Tables

Week 7: Oct 4 - 8

BigTable Presentation

Finishing DHTs

  • (See end of BigTable video)

Data Models

Week 8: Oct 11 - 15

Distributed Computation

Project Tips, Q&A

  • Project 1 Due ⋅ Oct 15
Week 9: Oct 18 - 22

Tues: No class, fall break!

MapReduce

Week 10: Oct 25 - 29

RDDs

Data Reduction: Sampling Streams

Spark

Week 11: Nov 1 - 5

Cluster Orchestration, Stream Processing

Spark Streaming

Week 12: Nov 8 - 12

Spatiotemporal Data

Spark Datasets

Week 13: Nov 15 - 19

Data Sketches

Cardinality Estimation

  • Quiz 4: Nov 16
Week 14: Nov 22 - 26

Machine Learning

Thurs: No class, Thanksgiving!

Week 15: Nov 29 - Dec 3
Week 16: Dec 6 - 10