Project 3: Memory Allocator (v1.0)

Starter repository on GitHub: https://classroom.github.com/a/rMpsgC7o

For our third project, we will develop a custom memory allocator. As we discussed in class, malloc isn’t provided by the OS, nor is it a system call; it is a library function that uses system calls to allocate and deallocate memory. In most implementations of malloc, these system calls involve at least one of:

sbrk is the simplest of the two – simply give it a size and it will increase the program break (a.k.a. bound). On the other hand, mmap gives us more control but requires more work to use. The malloc implementation we’ve been using on your Linux installations will use sbrk for small allocations and mmap for larger ones.

To simplify our allocator, we will only be using mmap, and we’ll allocate entire regions of memory at a time. The size of each region should be a multiple of the system page size; this means that if a program executes malloc(1) for a single byte of memory and our machine has a page size of 4096, we’ll allocate a region of 4096 bytes instead. Consequently, if the user requests 4097 bytes then we will allocate two regions’ worth of memory.

“Wait,” you’re thinking, “doesn’t that waste a whole lot of memory?” And the answer is a resounding yes. Herein lies the challenge: you will not only support allocating memory, but also use the free space management algorithms we discussed in class to split up and reuse empty regions:

You should be able to configure the active free space management algorithm via environment variables. To determine which algorithm to use:

char *algo = getenv("ALLOCATOR_ALGORITHM");
if (algo == NULL) {
    algo = "first_fit";
}

if (strcmp(algo, "first_fit") == 0) {
    ...
} else if (strcmp(algo, "best_fit") == 0) {
    ...
} else if (strcmp(algo, "worst_fit") == 0) {
    ...
}

If $ALLOCATOR_ALGORITHM isn’t set, you’ll use first fit as the default.

Managing Memory

mmap will give us regions of memory… But how do we distinguish between different allocations (called blocks) stored inside the regions? One good approach is to prefix each allocation with some metadata; this is how many malloc implementations work. Simply embed a struct at the start of each memory block that describes how large it is, whether it has been freed or not, and – most importantly – include a pointer to the next block in the chain. Looks like studying linked lists paid off after all! Here’s how this looks logically in memory, with each allocation prefixed with a metadata struct:



Struct prefix for each memory allocation

This means that allocating a single byte of memory actually takes more space: the single byte, plus the size of the metadata.

Looking at the big picture, we can also see how separate regions play a role in this. Here’s a visualization of several allocations:


Regions of memory and with links

Note that both the block size as well as the user-requested allocation size (shown in parenthesis) are provided, along with block usage, memory addresses and ‘free’ status. Here is a metadata struct that contains this information:

/**
 * Defines metadata structure for both memory 'regions' and 'blocks.' This
 * structure is prefixed before each allocation's data area.
 */
struct mem_block {
    /**
     * The name of this memory block. If the user doesn't specify a name for the
     * block, it should be auto-generated based on the allocation ID. The format
     * should be 'Allocation X' where X is the allocation ID.
     */
    char name[32];

    /** Size of the block */
    size_t size;

    /** Whether or not this block is free */
    bool free;

    /**
     * The region this block belongs to.
     */
    unsigned long region_id;

    /** Next block in the chain */
    struct mem_block *next;

    /** Previous block in the chain */
    struct mem_block *prev;

    /**
     * "Padding" to make the total size of this struct 100 bytes. This serves no
     * purpose other than to make memory address calculations easier. If you
     * add members to the struct, you should adjust the padding to compensate
     * and keep the total size at 100 bytes; test cases and tooling will assume
     * a 100-byte header.
     */
    char padding[35];
} __attribute__((packed));

In general you should not modify this struct, but you can do so if you obtain permission first.

Allocating Memory

Request new memory regions from the kernel via mmap. To perform the allocation, place a metadata struct at the start of a free memory address and then return a pointer to the ‘data’ portion of the memory shown in the first figure. Don’t return a pointer to the struct itself, because it will be overwritten by the user!

Memory allocations must be aligned to 8 bytes; in other words, the size of the memory blocks should be evenly divisible by 8. This should take the size of the block headers into account.

Once basic allocation works, you can start splitting blocks that are not 100% used. For instance, if a block is 1000 bytes in size but only 100 bytes are used, an allocation of less than or equal to 900 bytes can be accommodated by splitting and resizing the block.

When implementing your free space management algorithms, ties (i.e., blocks that satisfy the algorithm and are the same size) should be broken by choosing the first allocation you found based on the linked list order.

Freeing Memory

Set free = true. That’s it! This lazy approach is why you sometimes can read ‘old’ values from memory that have been freed. After freeing a block, you should also check neighboring blocks to determine whether you can merge with them or not.

If the entire memory region has been freed (i.e., all of the blocks within it are free), then you should free the region with munmap. You will be able to tell when a region is free because all its blocks have been merged into a single, large block (use the region IDs to distinguish where a region begins and ends in the linked list).

Reallocating Memory

If the user wants to realloc a pointer, check to see if its block can be resized in place by expanding into the next neighboring free block. If not, simply malloc a new, appropriately sized block, copy the data there, and then free the old block.

Extra Features

Since we’re writing our own version of malloc, we might as well add some features while we’re at it.

Named Blocks: to help with debugging, you can optionally provide a name for each allocation. These names will be shown when state information is printed.

Memory State Information: your allocator should be able to print out the current state of the regions and blocks with the print_memory() function. See the format below. This means your allocator can also act as a basic memory leak detector if you run print_memory() right before your program exits.

-- Current Memory State --
[REGION <region id>] <start addr>
  [BLOCK] <start addr>-<end addr> '<name>' <block-size> [FREE]
  [BLOCK] <start addr>-<end addr> '<name>' <block-size> [USED]
  [BLOCK] <start addr>-<end addr> '<name>' <block-size> [FREE]
[REGION <region id>] <start addr>
  [BLOCK] <start addr>-<end addr> '<name>' <block-size> [USED]

In this example, there are two memory regions and four blocks. Each element is printed out in order, so there is an implied link between element 1 and element 2, and so on.

Scribbling: C beginners often get tripped up by a seemingly strange behavior exhibited by malloc: sometimes they get a nice, clean chunk of memory to work with, and other times it will have ‘residual’ values that crash their program (usually when it’s being graded!). One solution to this, of course, is to use calloc() to clear the newly-allocated block. Since you are implementing your own memory allocator, you now understand why this happens: free() leaves old values in memory without cleaning them up.

To help find these memory errors, you will provide scribbling functionality: when the ALLOCATOR_SCRIBBLE environment variable is set to 1, you will fill any new allocation with 0xAA (10101010 in binary). This means that if a program assumes memory allocated by malloc is zeroed out, it will be in for a rude awakening.

You should scribble new allocations (malloc()) as well when you are reusing blocks; however, you should not scribble when realloc() is called (hmm… why?)

Danger Ahead

Some C library functions call malloc, calloc, realloc, etc. This means that if your implementation isn’t correct, other functions may fail in strange and unpredictable ways. Finish implementing a simple (wasteful) allocator first with a single block per region before moving on to the other functionality. You may also want to use a simple stub implementation of free that only sets usage = 0 during testing (i.e., no munmap).

Grading

Submission: submit via GitHub by checking in your code before the project deadline.

Restrictions: you may only use the standard C libraries. Other external libraries are not allowed unless permission is granted in advance. Your code must compile and run on your Arch Linux VM. Failure to follow these guidelines will will result in a grade of 0.

Changelog