CS 677 is focused on building and leveraging distributed systems to analyze large datasets. The course will consist of large programming assignments and you will also be required to submit written reports on assigned readings from the literature.

Each assignment will include a detailed specification document with a description of the problem, breakdown of points, permitted libraries, etc. You are free to discuss the projects with your classmates, but sharing code or pseudocode is not acceptable. Please see the grading policy

Submitting Assignments: use the project links below to create a git repository for your work. To submit, check your code into your git repository before the deadline.

Late Policy:

Research Papers

Presentation Order, Fall 2021:

  1. HDFS – Matthew Malensek
  2. Dynamo – Hugo Laboisse, Patrick Porter, Zhenzhen Wang
  3. BigTable – Alma Abbasi, Iris Li, Nikhil Matta
  4. MapReduce – Bill Li, Shulin Li, Kate Luo, Sam Wang
  5. Big Data Normalization – Nikhil Bhutani, Aryan Choudhary, Anthony Knox
  6. Resilient Distributed Datasets – Nuo Cheng, Ziyang Liu, Yiqi Wei
  7. SparkSQL – Daily Guo, He Wei
  8. SageDB – Yuan Qian, Yudan Su, Terry Tran
  9. AlphaFold – Junting Cai, Milton Carreno, Stephen Yu, Dan Zhong