Lab 5: Performance Benchmarking and Overhead
In this lab, we’ll determine whether the entire class has been built on a foundation of lies or not. There have been several claims made about the performance implications of system calls, but could they really be that significant? Let’s find out.
To complete this lab, you will need to take performance measurements by retrieving the current UNIX timestamp from the hardware real-time clock (RTC). You’ll create a new user space program called benchmark
that executes another program, determines how long it ran for, and counts the number of system calls it issued. You can take inspiration from the previous tracer program (you do not have to complete Lab 4 to do this assignment, but may benefit from reading it if you haven’t already).
The Setup
One of the major performance issues we discussed in Lab 2 was the large amount of system calls caused by fgets
reading character-by-character. To get an estimate of how much of a problem this is when it comes to execution speed, we can store a large file in our OS file system and then see how fast it can be read and printed out with either fgets
or getline
. As an initial point of comparison, we can also find out how long it takes to run the cat
utility on the file (note that cat
does not treat newline characters as a special case).
One “large” file that will work for this experiment is a 24-KB excerpt of H. G. Wells' The Time Machine. Download it to your OS directory with wget
or curl
and then add it to the file system image by editing your Makefile. Look for the recipe that builds fs.img
and use README.md
as a point of reference for adding the new file. Hint: the first line of the recipe lists its dependencies, and the second line tells make
how to build it. You need to update both lines.
Once you’ve successfully copied your reading material into the file system image, start up your OS and try running cat
on the file. It will take some time to read and print out on the console.
Tracking System Call Counts
Based on the previous lab, you should be able to easily add a counter to the process
struct that you will increment every time a system call is issued by the process. To make your life easier, kernel/syscall.c
is a good place to increment the counter. Just like in the previous lab, you will also need a way to access this information later. However, getting the system call count might not be so easy – you want the count when the process finishes, but then isn’t it already dead, gone, kaput, expired, deceased, departed… no more?
To work around this issue, we can blatantly steal borrow from Linux and other UNIX-like operating systems. Take a look at man 2 wait
and you’ll find something interesting: there is a version of wait
that returns resource utilization statistics! This approach makes sense; we want the statistics when the process is finished, and the best way to get that information is when the parent process is calling wait
.
To make this happen, add a wait2
system call that waits for a child process to complete and also returns both its (1) exit status and (2) system call count. Model wait2
after the original wait
system call. In fact, you should be able to completely replace the old implementation of wait
with a call to your new system call: return wait2(addr, 0);
. (Since the second parameter is 0, the system call count does not get returned, making it behave exactly like the old wait
).
The new concept you’ll learn here is copying information from kernel space to user space. We previously relied on return values to do this, but now we need to be able to return information to a memory address that exists in user space: when you pass in pointers to memory locations to store the exit status and system call count for a process, the kernel can’t simply access that memory directly. Check out the copyout
function in kernel/vm.c
– this is what you’ll need to get the information back to user space. Use the original wait
’s call to copyout
as a model for what you need to do.
Collecting Performance Measurements
Given that you can already retrieve a UNIX timestamp with nanosecond accuracy, this part will be easy. If you want to determine how long something takes, simply record when it started, when it ended, and calculate the difference between the two:
uint64 start = time();
thing_one();
thing_two();
etc();
uint64 end = time();
uint64 elapsed = end - start;
If you converted your timestamp into seconds at the system call level, you will want to refactor it so user space gets the full-resolution timestamp (not converted to seconds in advance).
Building the Benchmark Utility
benchmark
will be loosely inspired by tracer
from the previous lab. Have the program take command line arguments that determine what to run, and execute them as a child process. In the parent process, collect the performance measurements (child run duration) and report its system call count.
/benchmark cat time-machine.txt
... gigantic amounts of text print ...
He put down his glass, and walked towards the staircase door.
------------------
Benchmark Complete
Time Elapsed: 4982 ms
System Calls: 100
Making it go fast
Now that you can benchmark programs, it’s time to build a new, better, faster version of fgets
. One that doesn’t use as many system calls. Here’s a program called catlines.c
that uses fgets
to read a file line by line:
#include "kernel/fcntl.h"
#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
int
main(int argc, char *argv[])
{
if (argc <= 1) {
fprintf(2, "Usage: %s filename\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_RDONLY);
char buf[128];
int line_count = 0;
while (fgets(fd, buf, 128) > 0 ) {
printf("Line %d: %s", line_count++, buf);
}
return 0;
}
Build a similar program but swap the call to fgets
with an optimized function that you design. (You don’t need to add this function to ulib.c
– you can leave it in the test program). Benchmark the baseline (catlines.c
) and compare with subsequent versions of your optimized program. Be sure that your optimized program is correct, i.e., produces the same output! Keep track of the run times and system call counts in a text file like this (benchmark.txt
):
@ Time,Syscalls
base 68.9 382
opt1 22.4 88
opt2 13.6 69
opt3 4.20 32
Then you can produce a simple visualization with termgraph
. If termgraph
isn’t already installed, run python3 -m pip install termgraph
on gojira. Then, use it like this:
$ termgraph benchmark.txt --color {blue,red}
▇ Time ▇ Syscalls
base : ▇▇▇▇▇▇▇▇▇ 68.90
▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 382.00
opt1 : ▇▇ 22.40
▇▇▇▇▇▇▇▇▇▇▇ 88.00
opt2 : ▇ 13.60
▇▇▇▇▇▇▇▇▇ 69.00
opt3 : ▏ 4.20
▇▇▇▇ 32.00
(The colors are not shown in the example above)
This will help you track whether the changes you’ve made are making a difference or not. It’s okay if each new version of your program isn’t necessarily faster, it’s just part of the process.
Once you’ve built something that’s faster and benchmarked it, you’re done.
Grading and Submission
Once you are finished, check your changes into your OS repo. Then have a member of the course staff take a look at your lab to check it.
To receive 50% credit:
- Implement the
benchmark
utility with the ability to track process run time.
To receive 85% credit:
- Complete all previous requirements
- Implement the
wait2
system call and track the total number of system calls inbenchmark
To receive full credit for this lab:
- Complete all previous requirements
- Produce a new version of
catlines.c
that is faster than the baseline by optimizing system calls viafgets
- Check in your benchmark results to
docs/fgets-bench.txt
To receive 105% credit for this lab:
- Complete all previous requirements
- Produce the fastest optimized version of
fgets
. You can post your results on CampusWire if you think they’re particularly good. We may award a couple winners if it’s warranted.