CS 686 Big Data

Projects

CS 686 is focused on building and leveraging distributed systems to analyze large datasets. The course will consist of three large programming assignments, and you will also be required to submit written reports on assigned readings from the literature.

All three projects are meant to be done individually. Each project will include a detailed specification document with a description of the problem, breakdown of points, permitted libraries, etc. You are free to discuss the projects with your classmates, but sharing code or pseudocode is not acceptable.

All assignments are due at 6:00 PM on the due date. Late assignments will be penalized 10% per day for up to a maximum of 2 days.

Research Paper Evaluations

40% of your grade is to think critically about current research from the field. See the paper evaluation template for details.

Project 1: Building a Distributed File System

In our first project, we’ll build a distributed file system inspired by HDFS, GFS, and other systems we’ve studied. See the project description for details. This project is worth 20% of your course grade.

Project 2: Spatiotemporal Analysis with MapReduce

Our second project leverages the MapReduce framework to analyze a large meteorological dataset. See the project description for details. This project is worth 15% of your course grade.

Project 3: In-Memory Analysis with Spark

Project 3 uses Spark to extend our analysis from Project 2. You will also get the chance to work on a dataset of your choosing as a group. See the project description for details. This project is worth 25% of your course grade.