CS 326 Operating Systems

Project 3: Memory Allocator (v1.0)

Starter repository on GitHub: https://classroom.github.com/g/e_ahOQ2t

For our third project, we will develop a custom memory allocator. As we discussed in class, malloc isn’t provided by the OS, nor is it a system call; it is a library function that uses system calls to allocate and deallocate memory. In most implementations of malloc, these system calls involve at least one of:

sbrk is the simplest of the two – simply give it a size and it will increase the program break (a.k.a. bound). On the other hand, mmap gives us more control but requires more work to use. The malloc implementation we’ve been using on your Linux installations will use sbrk for small allocations and mmap for larger ones.

To simplify our allocator, we will only be using mmap, and we’ll allocate entire pages at a time. This means that if a program executes malloc(1) for a single byte of memory and our machine has a page size of 4096, we’ll allocate an 4096 bytes instead. Consequently, if the user requests 4097 bytes then we will allocate two pages’ worth of memory.

“Wait,” you’re thinking, “doesn’t that waste a whole lot of memory?” And the answer is a resounding yes. Herein lies the challenge: you will not only support allocating memory, but also use the free space management algorithms we discussed in class to split up and reuse empty blocks of memory:

You should be able to configure the active free space management algorithm via environment variables. To determine which algorithm to use:

char *algo = getenv("ALLOCATOR_ALGORITHM");
if (algo == NULL) {
    algo = "first_fit";
}

if (strcmp(algo, "first_fit") == 0) {
    ...
} else if (strcmp(algo, "best_fit") == 0) {
    ...
} else if (strcmp(algo, "worst_fit") == 0) {
    ...
}

If $ALLOCATOR_ALGORITHM isn’t set, you’ll use first fit as the default.

Managing Memory

mmap will give us blocks of memory… But how do we distinguish between different allocations stored inside the pages? One approach would be to keep a table of allocations, and whenever we need to allocate or free memory we’ll consult the table and update it accordingly. An even easier method is to prefix each allocation with some metadata; this is how most malloc implementations work. We’ll simply embed a struct at the start of each memory block that describes how large it is, whether it has been freed or not, and – most importantly – include a pointer to the next block in the chain. Looks like studying linked lists paid off after all! Here’s how this looks logically in memory, with each allocation prefixed with a metadata struct:



Struct prefix for each memory allocation

This means that allocating a single byte of memory actually takes more space: the single byte, plus the size of the metadata.

Looking at the big picture, we can also see how separate pages play a role in this:


Pages of memory and with links

Extra Features

Descriptions: Since we’re writing our own version of malloc, we might as well add some features while we’re at it. The first is the description field shown above. You should provide a malloc_description function that allows the user to include a short string or name describing the allocation as a second parameter. You will then provide a malloc_lookup function that can retrieve a pointer to memory based on its description.

Logging: Your implementation should log each allocation, free, etc. to a file if the ALLOCATOR_LOG environment variable is set to 1. Store the log file in /tmp/allocator_log_PID.log where PID is the process ID your allocator is serving. The log should contain a line for each allocator function called, along with the block information:

[MALLOC] "description string": start 0x00001 end 0x01001 size 4 [IN_USE]
[FREE] "description string": start 0x00001 end 0x01001 size 4 [FREE]
[CALLOC] "awesome stuff": start 0xC0A0C end 0xC0A10 size 4 [IN_USE]
[REALLOC] "awesome stuff": start 0xC0A0C end 0xC0A25 size 21 [IN_USE]

You should also support another function, print_memory(), which will walk through the entire linked list and print each block (both free and in use).

Scribbling: C beginners often get tripped up by a seemingly strange behavior exhibited by malloc: sometimes they get a nice, clean chunk of memory to work with, and other times it will have ‘residual’ values that crash their program (usually when it’s being graded!). One solution to this, of course, is to use calloc() to clear the newly-allocated block. Since you are implementing your own memory allocator, you now understand why this happens: free() leaves old values in memory without cleaning them up. To help find these memory errors, you will provide scribbling functionality: when the ALLOCATOR_SCRIBBLE environment variable is set to 1, you will fill any new allocation with 0xAA (10101010 in binary). This means that if a program assumes memory allocated by malloc is zeroed out, it will be in for a rude awakening.

Getting Started

For guidance on the basics, watch the Project 3 Video. NOTE: In the video implementation of realloc(), the old memory location is not marked available via a free() call after copying to the new, larger block. Remember to free the old block.

Grading

This is a mini project, worth 10 points towards your project grade. You are allowed to work in teams of two if you wish. If you choose to work as a team, then you must:

Grading Breakdown

Submission: submit via GitHub by checking in your code before the project deadline. You must include a makefile with your project. As part of the testing process, we will check out your code and run make to build it.

Restrictions: you may only use the standard C libraries. Other external libraries are not allowed unless permission is granted in advance. Your code must compile and run on your Raspberry Pi set up with Arch Linux as described in class. Failure to follow these guidelines will will result in a grade of 0.

Changelog