Memory Allocator with FIFO Free List
In this lab, we’ll build our own memory allocator that serves as an alternative to the default malloc
family of functions. If we take a look at the man
pages for malloc
et al., we’ll find that we need to implement four functions:
void *malloc(size_t size);
void free(void *ptr);
void *calloc(size_t nmemb, size_t size);
void *realloc(void *ptr, size_t size);
However, our memory allocator cannot create memory out of thin air: it needs to request memory from the Operating System first. To do this, we’ll use the mmap
system call. mmap
works very similarly to malloc
, but requires more arguments to set up. Additionally, its counterpart to release memory, munmap
, requires that we pass in the size of the block of memory to be released. (And as you’ll recall, all free
needs is a pointer to the block that is going to be freed).
To deal with these implementation differences, we’ll prefix each allocation with a struct that contains some metadata about the allocations. One such piece of metadata will be the allocation size, which we’ll pass to munmap
when we want to release memory. The other information we need to maintain is whether or not the block is free – if we immediately release memory back to the OS every time free
is called, our allocator won’t be very efficient. System calls take time to execute, so we want to reduce communication with the OS as much as we can. Given this, we’ll integrate a linked list that tracks all the free blocks that haven’t been released back to the OS yet.
This means our allocations will look something like this:
[Header][Data] --> [Header][Data] --> [Header][Data] --> etc.
You are given the basic skeleton of a mostly-functioning memory allocator and must enhance it to support:
- A free list (linked list of memory blocks)
- When an allocation is freed, it is added to the head of the list rather than unmapped immediately
- Block reuse capability
- When servicing a
malloc
request, you should first check the free list for a viable block beforemmap()
ing a new one - Blocks should be removed from the end or tail of the list. Since additions happen at the head and removals happen at the tail, we are effectively creating a FIFO cache of free blocks.
- When servicing a
- Threshold-based memory release
- If a large number of blocks have been freed, you should release memory back to the OS with
munmap
. We define a “large number” with a configurable threshold. - By default, the threshold is 100 blocks. If the free list contains more than 100 blocks, then you should unmap the oldest allocation on the free list to make room for the most recently freed block.
- The user may supply their own threshold with an environment variable,
ALLOC_THRESH
. You can use thegetenv
function to get the value of this environment variable, and if it is set then you should use the provided threshold instead of the default.
- If a large number of blocks have been freed, you should release memory back to the OS with
Look through the starter code to familiarize yourself with memory allocation basics. To download it directly to your VM, you can use the wget
command, and then extract the gzipped tar file with tar xvf allocator-starter.tar.gz
.
To build the allocator, you’ll compile like so:
cc allocator.c -Wall -fPIC -shared -o allocator.so
And then to actually use it:
LD_PRELOAD=$(pwd)/allocator.so ls
(In this example, we’re running ls
with our allocator).
Implementation Hints
- You have freedom in designing this lab, but remember to break functionality up into separate functions.
- You may want to test your linked list functions outside the context of the memory allocator; to do this, you can temporarily add a
main
to your code. - To create the FIFO cache of free blocks, you’ll need to implement a doubly-linked list, i.e., each node in the list should contain a link to its previous and next neighbors.
- Keep track of the head and tail of the list so you don’t need to iterate through it to find either end.
- There are
TODO
notes in the starter code to remind you what needs to be done.
Instrumenting Your Code
Use the TRACE
macro to evaluate whether your allocator is functioning properly; the events you must trace are:
malloc()
:- Allocating a new block
- Reusing an existing free block (instead of allocating a new one)
free()
:- Adding a block to the free list
- Unmapping a block because the free list is already full
realloc()
:- When a block is already large enough to accommodate the realloc request
- When a block is “resized” (allocate a new block, copy the old data to it, and then free the old block)
You can see an example of how this might look in the test run below.
Testing your allocator
You’ll need to add a bit of functionality for debugging purposes: a malloc_name(size, char *name)
function that allows you to make named allocations (store the name in the header struct) and a print_memory
function that prints the current state of the linked list, including memory locations, names, and sizes. You can use block names to confirm that allocations are ordered properly.
Once things are looking reasonable, you should be able to run the find
command with your memory allocator on a large directory. For example, run:
ALLOC_THRESH=10 LD_PRELOAD=$(pwd)/allocator.so find /
To execute find
over the entire file system tree. You may want to compile your allocator with logging and tracing turned off first so it finishes faster:
cc allocator.c -shared -fPIC -DLOGGER=0 -DTRACE_ON=0 -o allocator.so
Next, you can use this program as a test case. Compile your allocator with -DTRACE_ON=1
and then run it:
$ cc allocator-test.c -o allocator-test
$ ALLOC_THRESH=5 LD_PRELOAD=$(pwd)/allocator.so ./allocator-test
[TRACE] malloc(): Allocated block [0x7f2516018000]: 25 bytes
[TRACE] malloc(): Allocated block [0x7f2516017000]: 27 bytes
[TRACE] malloc(): Allocated block [0x7f2516016000]: 29 bytes
[TRACE] malloc(): Allocated block [0x7f2516015000]: 31 bytes
[TRACE] malloc(): Allocated block [0x7f2516014000]: 33 bytes
[TRACE] malloc(): Allocated block [0x7f2516013000]: 35 bytes
[TRACE] free(): Cached free block [0x7f2516018000]: 25 bytes
[TRACE] free(): Cached free block [0x7f2516017000]: 27 bytes
[TRACE] free(): Cached free block [0x7f2516016000]: 29 bytes
[TRACE] free(): Cached free block [0x7f2516015000]: 31 bytes
[TRACE] free(): Cached free block [0x7f2516014000]: 33 bytes
[TRACE] free(): Unmapped block -- [0x7f2516013000]: 35 bytes
[TRACE] malloc(): Reused block -- [0x7f2516018000]: 25 bytes
[TRACE] malloc(): Reused block -- [0x7f2516016000]: 29 bytes
[TRACE] malloc(): Allocated block [0x7f2516013000]: 36 bytes
[TRACE] realloc(): Unchanged ---- [0x7f2516013000]: 36 bytes
[TRACE] malloc(): Allocated block [0x7f2516012000]: 1024 bytes
[TRACE] free(): Cached free block [0x7f2516013000]: 36 bytes
[TRACE] realloc(): Resized block [0x7f2516013000]: 36 bytes -> 1024 bytes
[TRACE] malloc(): Allocated block [0x7f2516011000]: 1048 bytes
X is: 99
[TRACE] malloc(): Reused block -- [0x7f2516017000]: 27 bytes
[TRACE] free(): Cached free block [0x7f2516018000]: 25 bytes
[TRACE] free(): Cached free block [0x7f2516016000]: 29 bytes
[TRACE] free(): Unmapped block -- [0x7f2516012000]: 1024 bytes
[TRACE] free(): Unmapped block -- [0x7f2516017000]: 27 bytes
Benchmarking
The whole premise of this assignment is that less system calls will mean less overhead, and therefore give us better performance. But you shouldn’t trust any theory without testing it first. We’ll collect some empirical data on just how effective our free list is.
First, install the time
command (we’re not going to use the one that is already built into your shell):
$ sudo pacman -Syu time
Now that we have the time command installed, you can measure performance with the following:
$ ALLOC_THRESH=100 LD_PRELOAD=$(pwd)/allocator.so /usr/bin/time find / > /dev/null
NOTE: compile first without logging and tracing turned on!
You’ll get output that looks something like:
0.96user 1.56system 0:02.58elapsed 97%CPU (0avgtext+0avgdata 83532maxresident)
128inputs+0outputs (0major+200778minor)pagefaults 0swaps
So, this took 2.58 seconds to run. You can modify the value of ALLOC_THRESH
(including setting it to 0 to effectively disable caching) and measure how long each run takes. We need to collect data for a range of thresholds, so the following shell script is a good starting point. It will output the total elapsed time to a file named (threshold).out for each threshold, allowing us to find the best run time.
#!/usr/bin/env bash
rm -f *.out
echo "Starting benchmark"
for (( i = 0; i <= 40; i += 10 )); do
export ALLOC_THRESH=$i
output_file=$(printf "%03d.out" "${i}")
echo -n "${i}..."
LD_PRELOAD=$(pwd)/allocator.so /usr/bin/time -q -f "%e" \
-o "${output_file}" find / &> /dev/null
sed -i "1s/^/${i}\t/" "${output_file}" # Add threshold to beginning of file
done
echo 'done!'
echo 'Combining outputs into benchmark.out...'
cat *.out > benchmark.out
Put the script in a file, such as run.sh
, make it executable with chmod +x run.sh
, and run it. You’ll notice that the script only runs through the first few thresholds, so you should modify it to do a more exhaustive set of benchmarks.
After doing this, we can find the absolute best run time with:
sort -n -k 2 benchmark.out | head -n1
But absolute best may not be exactly what we want. What are the tradeoffs here?
Finally, to get fancier, you can plot this in your terminal using termgraph:
$ sudo pacman -Sy python-pip
$ python3 -m pip install termgraph
$ ~/.local/bin/termgraph benchmark.out
0 : ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.93
10 : ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1.85
20 : ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1.61
30 : ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1.51
40 : ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1.41
50 : ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1.37
What to Turn In
Check the following into your lab repo:
- Your memory allocator,
allocator.c
- A Makefile that builds
allocator.so
- README.md file explaining how the allocator works and how to use it
- Results from your benchmarks and a short discussion on what the best threshold might be. You should also propose a couple of options for making your allocator even more efficient.
Grading and Submission
To receive 80% credit:
- Basic memory allocator implementation
- Linked list caching for free blocks
print_memory
malloc_name
- Makefile
- README explaining how the allocator works and how to use it
To receive 95% credit:
- Complete all previous requirements
- Tracing functionality
- Benchmarking
To receive 100% credit:
- Complete all previous requirements
- Performance discussion