Project 1: Distributed File System (v 1.0)

Starter repository on GitHub: https://classroom.github.com/a/zNf5GIOn

In this project, you will build your own distributed file system (DFS) based on the technologies we’ve studied from Amazon, Google, and others. Your DFS will support multiple storage node responsible for managing data. Key features include:

Your implementation must be done in Go (unless otherwise arranged with the professor), and we will test it using the orion cluster here in the CS department. Communication between components must be implemented via sockets (not RMI, RPC or similar technologies. In particular, you are not allowed to use gRPC for this project). You may use third party libraries as long as you get instructor approval first.

In Project 2, you will use this DFS to build your own implementation of MapReduce.

Since this is a graduate-level class, you have leeway on how you design and implement your system. However, you should be able to explain your design decisions. Additionally, you must include the following components in some form:

Controller

The Controller is responsible for managing resources in the system, somewhat like an HDFS NameNode. When a new storage node joins your DFS, the first thing it does is contact the Controller. The Controller will maintain information such as:

When clients wish to store a new file, they will send a storage request to the controller, and it will reply with a list of destination storage nodes (plus replica locations) to send the chunks to. The Controller itself should never see any of the actual files, only their metadata.

Replication: The Controller is also responsible for detecting storage node failures and ensuring the system replication level is maintained. In your DFS, every chunk will be replicated twice for a total of 3 duplicate chunks. This means if a storage node goes down, you can re-route retrievals to a backup copy. You’ll also maintain the replication level by creating more copies in the event of a failure. You will need to design an algorithm for determining replica placement.

Storage Node

Storage nodes are responsible for storing and retrieving file chunks. When a chunk is stored, it will be checksummed so on-disk corruption can be detected. When a corrupted file is retrieved, it should be repaired by requesting a replica before fulfilling the client request. Metadata, such as checksums, should also be stored on the disk.

The storage nodes will send a heartbeat to the controller periodically to let it know that they are still alive. Every 5 seconds is a good interval for sending these. The heartbeat contains the free space available at the node, the total number of requests processed (storage, retrievals, etc.), and (optionally) any new files that have been stored at the node.

On startup: provide a storage directory path and the hostname/IP of the controller.

Client

You will build a basic client that allows storage and retrievals. Its functions include:

NOTE: Your client must either accept command line arguments or provide its own text-based command entry interface. Recompiling your client to execute different actions is not allowed and will incur a 5 point deduction.

Tips and Resources

Project Deliverables

This project is worth 16 points. The deliverables include:

Note: your system must be able to support at least 12 active storage nodes, i.e., the entire orion cluster.

Grading

We’ll schedule a demo and code review to grade your assignment. You will demonstrate the required functionality and walk through your design. Here is what you will be required to do during the demo.

I will deduct points if you violate any of the requirements listed in this document — for example, using an unauthorized external library. I may also deduct points for poor design and/or formatting; please use good development practices, break your code into separate classes or modules based on functionality, and include comments in your source where appropriate.

Changelog