Course Schedule
The following is a tentative schedule for the course (subject to change).
| Topic | Deadlines & Materials |
|---|---|
| Week 1: January 25 – 31 | |
Introduction to Big Data Big Data Case Studies and Datasets A Quick Tour of Go |
|
| Week 2: February 1 – 7 | |
Unit 1: Scaling and Storage Slightly More Advanced Go Scaling Out Research Papers and HDFS Presentation |
|
| Week 3: February 8 – 14 | |
HDFS, Warehouse-Scale Computing Network Design |
|
| Week 4: February 15 – 21 | |
Fault Tolerance and Consensus |
|
| Week 5: February 22 – 28 | |
Distributed Hash Tables |
|
| Week 6: March 1 – 7 | |
Data Models |
|
| Week 7: March 8 – 14 | |
Unit 2: Distributed & Parallel Computation Introduction to Distributed Computation Hadoop Setup |
|
| Week 8: March 15 – 21 | |
| Spring Break! | |
| Week 9: March 22 – 28 | |
Hadoop MapReduce |
|
| Week 10: March 29 – April 4 | |
Designing a MapReduce Job |
|
| Week 11: April 5 – 11 | |
Spark Setup Cluster Orchestration |
|
| Week 12: April 12 – 18 | |
Unit 3: Streaming Algorithms and Applications Big Data Sampling Techniques Bloom Filters |
|
| Week 13: April 19 – 25 | |
Data Sketches |
|
| Week 14: April 26 – May 2 | |
Spatiotemporal Data Working with Spark |
|
| Week 15: May 3 – 9 | |
Spark Streaming SageDB, Machine Learning |
|
| Week 16: May 10 – 16 | |
Wrapping up the Semester |
|
| Week 17: May 17 – 23 | |
Final Quiz: Monday, May 18 ⋅ 10:00am – 10:30 am |
|