Lab 4: Scalable Log Analyzer

This lab will combine what we’ve learned from our previous labs — processing logs and transferring files — to create a scalable log analyzer.

Instead of fine-tuning and optimizing your log analyzer, our goal here will be to scale out across as many machines as necessary.

This lab introduces distributed systems concepts including:

You are free to implement this lab as you see fit, so long as it:

The following structure and workflow should give you some ideas on how to get started.

Workflow

A high-level overview of the workflow:

  1. Start instances of the log analyzer server (laserver) on the cluster.
  2. Provide the log analyzer client (laclient) with a log file and list of analyzer servers as its command line arguments.
  3. Have the client send tasks to the servers that it produces by reading the log file
  4. Servers work on their tasks to build the log summaries
  5. Client collects and reports the results

Each part of the system needs to be benchmarked (e.g., time to read files, send messages, process tasks, etc.) so we can assess performance and make iterative improvements.

Drilling down a bit more, the server is responsible for:

And the client is responsible for:

This lab is less restrictive than the previous; feel free to be creative. What you build here will be helpful in your Project 1 implementation.

Submission

Check your code into a separate branch on your Lab 1 repo by the deadline.

In addition to your code, add benchmark results to your README.md file, and discuss the following points:

  1. Compare with your previous log analyzer implementation.
  2. Determine whether there is a breaking point where scaling out no longer improves performance.
  3. Propose further improvements to your system/algorithms