Project 3: Memory Allocator (v1.0)
For our third project, we will develop a custom memory allocator. As we discussed in class, malloc
isn’t provided by the OS, nor is it a system call; it is a library function that uses system calls to allocate and deallocate memory. In most implementations of malloc
, these system calls involve at least one of:
sbrk
mmap
sbrk
is the simplest of the two – simply give it a size and it will increase the program break (a.k.a. bound). On the other hand, mmap
gives us more control but requires more work to use. Some malloc
implementations use sbrk
for small allocations and mmap
for larger ones.
Our allocator will use sbrk
to request memory or release memory. To reduce the number of system calls it needs to make, each call to sbrk
will be a multiple of the system page size; this means that if a program executes malloc(1)
for a single byte of memory and our machine has a page size of 4096, we’ll request 4096 bytes instead. Consequently, a malloc
for 4097 bytes will be rounded up to 8192 bytes. We’ll call these blocks of memory.
You may wonder if this approach will waste memory, and the answer is yes… IF we don’t split memory blocks into sub-blocks. Herein lies the challenge: you will not only support allocating memory, but also use the free space management algorithms we discussed in class to split up and reuse empty regions:
- First fit
- Best fit
- Worst fit
You should be able to configure the active free space management algorithm via a function call, malloc_setfsm()
.
Managing Memory
Given this allocation strategy, we will prefix each sub-block with some metadata; this is how many malloc
implementations work. Simply embed a struct
at the start of each memory block that describes how large it is, whether it has been freed or not, and include a pointer to the next block in the chain. Looks like studying linked lists paid off after all! Here’s how this looks logically in memory, with each allocation prefixed with a metadata struct:
This means that even if we ignore our plan to round up to the nearest page, allocating a single byte of memory actually takes more space: the single byte, plus the size of the metadata.
Tracking Memory
We have to have a way to keep track of this information, using two doubly-linked lists:
- Block list: This linked list is used to merge blocks that have been split, release unused pages back to the kernel, and to print a summary of the memory state.
- Free list: searching the entire block list to determine whether the current
malloc
request can be satisfied by an existing free block would be very slow, so we maintain a list of only free blocks, in the order they were freed.
Here is an example metadata struct that contains block information:
struct mem_block {
/**
* The name of this memory block. If the user doesn't specify a name for the
* block, it should be left empty (a single null byte).
*/
char name[8];
/** Size of the block */
uint size;
/** Links for our doubly-linked list of blocks: */
struct mem_block *next_block;
struct mem_block *prev_block;
}
It seems like some information is missing:
- Don’t we need a flag to determine whether the block is free or not?
- What about the free list pointers?
But don’t worry. We will align allocations to 16 bytes. That means the first 3 bits of the block’s size will be unused, so we can use the 0th bit to store the free flag. Additionally, since a freed block will be always have at least 16 bytes of data available after the header, we’ll store our free pointers there. After all, they won’t be in use if the block is free.
Allocating Memory
If your program needs more memory, request additional blocks via sbrk
. Place a metadata struct at the start the memory and return a pointer to the ‘data’ portion of the memory shown in the first figure. Don’t return a pointer to the struct itself, because it will be overwritten by the user!
Memory allocations must be aligned to 16 bytes; in other words, the size of the memory blocks should be evenly divisible by 16. The minimum viable block’s data portion is 16 bytes, and the overall minimum size of a block is 48 bytes.
Once basic allocation works, you can start splitting blocks that are not 100% used. For instance, if a block is 4096 bytes in size but only 96 bytes are actually used, split the block in two: one 96-byte block, and one 4000-byte block.
When implementing your free space management algorithms, ties (i.e., blocks that satisfy the algorithm and are the same size) should be broken by choosing the first allocation you found based on the linked list order.
Freeing Memory
First, set the 0th bit of size
to 1. Next, use the data payload portion of the block to store a pointer to the next free block. That’s it! This approach is why you sometimes can read ‘old’ values from memory that have been freed. After freeing a block, you should also check neighboring blocks to determine whether you can merge with them or not. Merge with any free neighboring blocks.
If an entire block has been freed (i.e., 4096 bytes or more are free at the end of the address space), then you should decrease the bound of the program with sbrk
to release the memory for other programs to use.
Reallocating Memory
If the user wants to realloc
a pointer, first check to see if the block can be resized in place. Ways this could happen:
- The block already has some extra space because of its alignment, so no changes need to be made to complete the
realloc
- The block is being shrunk, requiring (1) a metadata update, and (2) splitting off a new block from the extra free space that was made (if possible).
- If the free space created by shrinking a block is located next to another free block, they should be merged.
- The block can expand into a neighboring free block. If the entire free block is not consumed by the expansion, then the remaining free space should be split off into a new free block.
If none of the situations above are possible (e.g., the block is too large to resize in place), simply malloc
a new, appropriately sized block, copy the data there, and then free the old block.
Edge Cases: If the pointer passed into realloc
is NULL
, then it should behave like malloc
instead since there is nothing to resize. Additionally, if the size passed into realloc
is 0
, then the block should be freed.
Extra Features
Since we’re writing our own version of malloc
, we might as well add some features while we’re at it.
Named Blocks: to help with debugging, you can optionally provide a name for each allocation. These names will be shown when state information is printed.
Memory State Information: your allocator should be able to print out the current memory state with the malloc_print()
function. See the format below.
-- Current Memory State --
[BLOCK 0x7f0d774e7000-0x7f0d774e70a8] 168 [USED] 'Blk 1'
[BLOCK 0x7f0d774b0000-0x7f0d774b0050] 80 [USED] 'Blk 2'
[BLOCK 0x7f0d774af000-0x7f0d774af0a8] 168 [USED] 'Blk 3'
...
(list continues)
-- Free List --
[0x7f0d774e70a8] -> [0x7f0d774b0050] -> [0x7f0d774af0a8] -> (...) -> NULL
Each element is printed out in order, so there is an implied link between element 1 and element 2, and so on.
Leak Check: You can leverage the metadata we are tracking to find memory leaks, so add a malloc_leaks()
function. malloc_leaks()
will print leaks, a summary, and return true
if leaks were found:
-- Leak Check --
[BLOCK 0x7f0d774e7000] 168 'Blk 1'
[BLOCK 0x7f0d774b0000] 80 'Blk 2'
...
(list continues)
-- Summary --
542 blocks lost (892412 bytes)
Scribbling: C beginners often get tripped up by a seemingly strange behavior exhibited by malloc
: sometimes they get a nice, clean chunk of memory to work with, and other times it will have ‘residual’ values that crash their program (usually when it’s being graded!). One solution to this, of course, is to use calloc()
to clear the newly-allocated block. Since you are implementing your own memory allocator, you now understand why this happens: free()
leaves old values in memory without cleaning them up.
To help find these memory errors, you will provide scribbling functionality: when scribbling is enabled, you will fill any new allocation with 0xAA
(10101010
in binary). This means that if a program assumes memory allocated by malloc
is zeroed out, it will be in for a rude awakening – for instance, what might’ve been assumed to be 0
in a single byte will now be 170 (10101010 instead of 00000000).
You should scribble new allocations (malloc()
), reused blocks, and any new space in a realloc
. Provide malloc_scribble()
to toggle this feature.
Supported Functions
malloc
free
calloc
realloc
malloc_setfsm()
malloc_print()
malloc_leaks()
malloc_scribble()
Grading and Submission
Check your changes into your OS repo as you work. You should test your allocator with a variety of test programs and commands to make sure it works.
This project is worth 13 points.
To receive 70% credit, implement:
- Basic allocation support:
malloc
,free
,calloc
- Finding and reusing freed blocks (first fit)
malloc_print()
To receive 80% credit, implement:
- Splitting existing blocks instead of simply reusing them
- Best fit, worst fit, and
malloc_setfsm()
implementation
To receive 90% credit, implement:
- Merging free blocks with their neighbor(s)
realloc
To receive 95% credit, implement:
- Leak check,
malloc_leaks()
- Scribbling support, toggled with
malloc_scribble()
To receive full credit for this project:
- Leak analysis (store results in
/docs
directory):- Test your shell for memory leaks using
malloc_leaks()
, and report the results- Fix the memory leaks and provide a diff of the changes in the results (run
git diff shellname.c
after making your changes)
- Fix the memory leaks and provide a diff of the changes in the results (run
- Choose one of the P1 projects that uses
malloc
, run a leak check on it, and report the results- If there were leaks, post a comment on the pull request with your findings.
- Test your shell for memory leaks using
- Documentation explaining how your project works and how to use the new functions (such as
malloc_setfsm()
,malloc_scribble()
, etc.)
Your grade will also include an additional 2 points from the code review / demo. Things that we will check:
- Running programs that use your allocator, test programs.
- Code walkthrough for quality and stylistic consistency
- You will be asked to explain 1-3 parts of your implementation. You should be able to describe high-level design aspects and the challenges you faced implementing the features.
- Functions, structs, etc. have documentation where appropriate
- No dead, leftover, or unnecessary code.