Project 1: Distributed File System (v 1.0)

Starter repository on GitHub: https://classroom.github.com/g/pCxOMgR6

In this project, you will build your own distributed file system (DFS) based on the technologies we’ve studied from Amazon, Google, and others. Your DFS will support multiple storage nodes responsible for managing data. Key features include:

Your implementation must be done in Go (unless otherwise arranged with the professor), and we will test it using the orion cluster here in the CS department. Communication between components must be implemented via sockets (not RMI, RPC or similar technologies. In particular, you are not allowed to use gRPC for this project) and you may not use any external libraries other than those explicitly stated in the project spec without instructor approval.

Since this is a graduate-level class, you have leeway on how you design and implement your system. However, you should be able to explain your design decisions. Additionally, you must include the following components:

Controller

The Controller is responsible for managing resources in the system, somewhat like an HDFS NameNode. When a new storage node joins your DFS, the first thing it does is contact the Controller. At a minimum, the Controller contains the following data structures:

When clients wish to store a new file, they will send a storage request to the controller, and it will reply with a list of destination storage nodes (plus replica locations) to send the chunks to. The Controller itself should never see any of the actual files, only their metadata.

Name Index: For file name lookup, you will implement a bloom filter of all the file names stored in the system. When the controller receives a retrieval request from a client, it will query the bloom filter to determine whether it is actually worth reading from the on-disk index.

File Index: You have some flexibility in creating your on-disk index; simply storing a huge text file with [file name, location] pairs is probably inefficient, so you should find a way of breaking the index up into smaller parts that are faster to read from the disk.

File System Tree: Just like any other modern file system, users should be able to store files inside of folders (or even nested folders).

Replication: The Controller is also responsible for detecting storage node failures and ensuring the system replication level is maintained. In your DFS, every chunk will be replicated twice for a total of 3 duplicate chunks. This means if a system goes down, you can re-route retrievals to a backup copy. You’ll also maintain the replication level by creating more copies in the event of a failure. You will need to design an algorithm for determining replica placement.

Storage Node

Storage nodes are responsible for storing and retrieving file chunks. When a chunk is stored, it will be checksummed so on-disk corruption can be detected. When a corrupted file is retrieved, it should be repaired by requesting a replica before fulfilling the client request. Metadata, such as checksums, should be stored alongside the files on disk.

The storage nodes will send a heartbeat to the controller periodically to let it know that they are still alive. Every 5 seconds is a good interval for sending these. The heartbeat contains the free space available at the node, the total number of requests processed (storage, retrievals, etc.), and (optionally) any new files that have been stored at the node.

On startup: provide a storage directory path and the hostname/IP of the controller.

Client

You will build a basic client that allows storage and retrievals. Its functions include:

The client will also be able to print out a list of active nodes (retrieved from the controller) and the total disk space available in the cluster (in GB), and number of requests handled by each node.

NOTE: Your client must either accept command line arguments or provide its own text-based command entry interface. Recompiling your client to execute different actions is not allowed and will incur a 5 point deduction.

Tips and Resources

Project Deliverables

This project will be worth 18 points. The deliverables include:

Note: your system must be able to support at least 12 active storage nodes, i.e., the entire orion cluster.

Grading

We’ll schedule a demo and code review to grade your assignment. You will demonstrate the required functionality and walk through your design.

I will deduct points if you violate any of the requirements listed in this document — for example, using an unauthorized external library. I may also deduct points for poor design and/or formatting; please use good development practices, break your code into separate classes or modules based on functionality, and include comments in your source where appropriate.

Changelog