Project 1: Distributed File System (v 1.0)

Starter repository on GitHub: https://classroom.github.com/a/zNf5GIOn

In this project, you will build your own distributed file system (DFS) based on the technologies we’ve studied from Amazon, Google, and others. Your DFS will support multiple storage node responsible for managing data. Key features include:

Parallel storage/retrieval: large files will be split into multiple chunks to spread load across the cluster and provide parallelism during retrievals.
Interoperability: the DFS will use Google Protocol Buffers to serialize messages. This allows other applications to easily implement your wire format.
Fault tolerance: your system must be able to detect and withstand two concurrent storage node failures and continue operating normally. Additionally, it should be able to lose storage nodes all the way down to its minimum replication level (3) provided that there is enough time to between failures to re-replicate data. It will also be able to recover corrupted files.

Your implementation must be done in Go (unless otherwise arranged with the professor), and we will test it using the orion cluster here in the CS department. Communication between components must be implemented via sockets (not RMI, RPC or similar technologies. In particular, you are not allowed to use gRPC for this project). You may use third party libraries as long as you get instructor approval first.

In Project 2, you will use this DFS to build your own implementation of MapReduce.

Since this is a graduate-level class, you have leeway on how you design and implement your system. However, you should be able to explain your design decisions. Additionally, you must include the following components in some form:

Controller
Storage Node
Client

Controller

The Controller is responsible for managing resources in the system, somewhat like an HDFS NameNode. When a new storage node joins your DFS, the first thing it does is contact the Controller. The Controller will maintain information such as:

A list of active storage nodes
Mappings between file names, chunks, and the storage nodes that contain them

When clients wish to store a new file, they will send a storage request to the controller, and it will reply with a list of destination storage nodes (plus replica locations) to send the chunks to. The Controller itself should never see any of the actual files, only their metadata.

Replication: The Controller is also responsible for detecting storage node failures and ensuring the system replication level is maintained. In your DFS, every chunk will be replicated twice for a total of 3 duplicate chunks. This means if a storage node goes down, you can re-route retrievals to a backup copy. You’ll also maintain the replication level by creating more copies in the event of a failure. You will need to design an algorithm for determining replica placement.

Storage Node

Storage nodes are responsible for storing and retrieving file chunks. When a chunk is stored, it will be checksummed so on-disk corruption can be detected. When a corrupted file is retrieved, it should be repaired by requesting a replica before fulfilling the client request. Metadata, such as checksums, should also be stored on the disk.

The storage nodes will send a heartbeat to the controller periodically to let it know that they are still alive. Every 5 seconds is a good interval for sending these. The heartbeat contains the free space available at the node, the total number of requests processed (storage, retrievals, etc.), and (optionally) any new files that have been stored at the node.

On startup: provide a storage directory path and the hostname/IP of the controller.

Client

You will build a basic client that allows storage and retrievals. Its functions include:

Breaking files into chunks, asking the controller where to store them, and then sending them to the appropriate storage node(s).
- Note: Once the first chunk has been transferred to its destination storage node, that node will pass replicas along in a pipeline fashion. The client should not send each chunk 3 times.
- If a file already exists, reject the request. The user can remove the file first if they wish.
- While you can have a default chunk size, the user should be able to specify the chunk size during storage.
Retrieving files in parallel. Each chunk in the file being retrieved will be requested and transferred on a separate thread. Once the chunks are retrieved, the file is reconstructed on the client machine.
- You can retrieve replicas as well as original chunks to increase parallelism here.
Deleting existing files.
Listing files stored in the system. Basically, an ls command.
The client will also be able to print out a list of active nodes (retrieved from the controller) and the total disk space available in the cluster (in GB), and number of requests handled by each node.

NOTE: Your client must either accept command line arguments or provide its own text-based command entry interface. Recompiling your client to execute different actions is not allowed and will incur a 5 point deduction.

Tips and Resources

Log events in your system! For example, if a StorageNode goes down, the controller should probably print a message acknowledging so. This will be extremely helpful when debugging your system.
Use the orion cluster (orion01 – orion12) to test your code in a distributed setting.
- These nodes have the Protocol Buffers library installed as well as the protoc compiler.
- To store your data, use /bigdata/students/$(whoami), where $(whoami) expands to your user name. DO NOT use your regular home directory, as it will fill up and your account will get locked (and you can potentially lose data). To reiterate: you MUST store files under the /bigdata/students directory on the orion machines. Not doing so carries a 5 point deduction.

Project Deliverables

This project is worth 16 points. The deliverables include:

[4 pts]: Controller
- [1] Storing metadata
- [1] Node failure detection
- [1] Coordinating storage and retrievals
- [1] Coordinating replica maintenance
[4 pts]: Storage node implementation:
- [1] Storing chunks and checksums on local disks
- [1] Detecting (and recovering from) file corruption
- [1] Coordinating replica maintenance based on Controller instructions
- [1] Heartbeat messages and required information (disk space available, number of requests handled)
[4 pts]: Client implementation:
- [1] Storing files (configuring chunk size and chunk creation, determining appropriate servers)
- [1] Retrieving files in parallel
- [1] Listing files
- [0.5] Deleting files
- [0.5] Viewing the node list, available disk space, and requests per node.
[3 pts]: Interactive tests. You earn all 3 points unless:
- You have to restart components of your system during testing
- The client crashes during storage, retrieval, etc.
- A file is returned corrupted but then retrieves correctly on the second try
- You can’t explain how part of your system works
- Depending on severity, these issues carry a 1 point deduction per instance.
[1 pts]: Design document and retrospective. You may use UML diagrams, Vizio, OmniGraffle, etc. This is more to benefit you later when you want to refer back to the project or explain it in interviews etc. It outlines:
- Components of your DFS (this includes the components outlined above but might include other items that you think you’ll need)
- Design decisions (how big the chunks should be, how you will place replicas, etc…)
- Messages the components will use to communicate
- Answers to retrospective questions

Note: your system must be able to support at least 12 active storage nodes, i.e., the entire orion cluster.

Grading

We’ll schedule a demo and code review to grade your assignment. You will demonstrate the required functionality and walk through your design. Here is what you will be required to do during the demo.

I will deduct points if you violate any of the requirements listed in this document — for example, using an unauthorized external library. I may also deduct points for poor design and/or formatting; please use good development practices, break your code into separate classes or modules based on functionality, and include comments in your source where appropriate.

Changelog

2/13: Version 1.0 posted