Project 1 - Load Balancing in a Distributed Data Backup Application

Still subject to change...

Due - Monday, February 26, 2007

For this project, you will implement a testbed that will enable you to evaluate various load balancing strategies for a distributed data backup application. Your testbed will consist of servers that accept requests to store data and clients that upload data to the servers. Undergraduate students will implement a client-aware load balancing algorithm that enables clients to independently select from a set of potential servers. Graduate students will implement a separate load balancer that will attempt to balance load among the candidate backup servers.

The Server

The server will be a multithreaded application that simply waits for a client request and spawns a thread to process the request. Processing the request will consist of receiving data from the client and storing it to disk. You may use any protocol you like for communication between the client and the server. You may also reuse code that you have previously written. However, keep it simple! The goals of this assignment are to give you experience with socket programming and performance measurement. You will have the opportunity to further develop this application for Project 2 if you so choose.

The Client

The client will be designed to test the performance of the system. It will repeatedly wait for X seconds, where X is a configurable parameter, and upload a file to a backup server. You should not implement a user interface or data restore capability. Again, you will have the opportunity to further develop this application for Project 2 if you wish.

The client implementation will vary slightly depending on the load balancing algorithm used. Undergraduates will implement Algorithm 1, a client-aware load balancing algorithm. Your client will be configured with a list of the IP address of all candidate servers. For each file uploaded by the client, it will select the IP address of a server randomly, using a uniform distribution.

Graduate students will begin with Algorithm 1 described above. They will compare the results of this algorithm with two additional algorithms implemented in the load balancer. To use the load balancer, the client will be modified to upload data in a two-step process. In step 1, the client will contact the load balancer to request the IP address of a backup server. In step 2, the client will connect to the IP address given and upload the data.

The Load Balancer

The load balancer will receive a client request for an IP address, apply one of two load balancing algorithms, and return the IP address of the chosen backup server. When the server is launched, it will be configured to use either algorithm 2 or 3 (described below). In other words, for a single experiment, the load balancer will only use one algorithm.

Algorithm 2 - Round Robin: Algorithm 2 will be a simple round robin. The load balancer will maintain a list of the IP addresses of the candidate servers. The first client request will be directed to server 1, the second to server 2 and so on. In this way, each server should receive a balanced number of requests

Algorithm 3 - Size-aware: Algorithm 3 will choose servers based on the size of the data to be uploaded. When a client contacts the load balancer it will provide the size of the file it intends to upload. The load balancer will keep track of the total amount of data it believes has been uploaded to each server and will select the server in an effort to balance the amount of data stored on each server. Note that if a client upload fails after it has received the server IP from the load balancer the load balancer may have an incorrect view of the amount of data stored on each server. You do not need to solve this problem for Project 1, though you may select to work on it for Project 2.

Experimental Setup

You will run several experiments for each algorithm. Each experiment will run for a fixed length of time, for example 5 minutes. You will fix the number of servers at 4 and use one or more instances of your client to generate the workload. Though decreasing the time between uploads for one client will have a similar impact as increasing the total number of clients, you should do some preliminary experimentation to determine how many clients you will use for your experiments. Your goal is to generate a reasonable amount of load for the servers and load balancer. You will likely use 8-10 clients for your experiments.

You will vary the following parameters:

The size of the files uploaded by the client - For each upload, your client will upload either a small (10-20K) or large (1-2MB) file. You will configure the client will the percentage of files that should be large. In other words, if you set this parameter at 20%, 2 out of 10 files uploaded by the client should be large files and the other 8 should be small files.
Time between uploads for one client - Between each upload, each client will sleep for X seconds. X is your configurable parameter and will range from 0 to 30.

You will collect data about the following metrics:

Number of requests served by each server - You will measure the total number of requests that a each server is able to service during the run of the experiment.
Number of MBytes stored on each server - You will measure the total number of MBytes stored on each server at the end of the experiment.
Client service time - You will measure the average amount of time required for a single upload at the client side.
Requests/second served by the load balancer (Graduate students only) - You will measure the average number of requests the load balancer is able to service.

Results

You will submit a written report that contains the an overview of your implementation, an overview of your experimental setup, and the results of your experiments. In this report, make sure to note any conditions or observations affecting your results. For example, you might notice that the same experiment yields different results on different days/times. Why? Well, it could be the load on the machine varies. Do your best to come up with an explanation for why you see these affects. Your results will consist of several graphs, each accompanied by at least one paragraph summarizing what the graph shows as well as the findings evident from the graph. You should explain the general trends you see (for example, "Not surprisingly, as the time between requests decreases, the number of requests per second served by the load balancer increases as well.") as well as point out and provide explanations of any anomalies. This latter part is the most interesting and you should be sure to explain any and all strange behavior.

As a guide, undergraduates should have 6 graphs. All will use algorithm 1. The first three will fix parameter 1 to some reasonable value (e.g., 10 seconds) and vary parameter 2 from 0 to 100. Graphs 1-3 will report metrics 1-3. The second three will fix parameter 2 to some reasonable value (e.g., 50) and vary parameter 1 from 0 to 30 seconds. Graphs 4-6 will report metrics 1-3.

Graduate students should have 8 graphs. The first four will fix parameter 1 and vary parameter 2 as described above for the undergraduate graphs. Graphs 1-4 will report metrics 1-4 for all three algorithms. In other words, each graph will have three lines or sets of bars, one for each algorithm. Graphs 5-8 will fix parameter 2 and vary parameter 1 and report metrics 1-4 for all three algorithms.

Implementation Requirements and Hints

You may work in groups of 2 or 3.
If you have not done any previous socket programming, you will use Java for this assignment. If you have done socket programming in the past and would like to use a different language, email the instructor and specify the language you would like to use. Requests to use other languages must be approved.
Start early. Experimentation can take a long time, particularly when others are using the same machines.
You will need to think a bit about how to implement your measurement framework into your testbed. Make all of your measurements as accurate as possible.
If your experiments yield unstable results, for example the client service time bounces between 5ms and 3seconds, run your experiments again.
Though we will not be testing fault-tolerance or other similar properties, your protocols should handle all possible error conditions. For example, a client should not explode if it tries to contact a server that is down.
Make sure you correctly deal with pesky issues such as file naming. For example, if two clients upload index.html to the same server, the server should recognize that there are two copies of the file.

Due 5:30PM - Monday, February 26, 2007

Complete and submit your working code. Place a copy of your source code in the submit directory /home/submit/cs680-s07/username.
Turn in a hard copy of the your written report containing your results and analysis.

Note: No portion of your code may be copied from any other source including another text book, a web page, or another student (current or former). You must provide citations for any sources you have used in designing and implementing your program.

Sami Rollins