Project 1: Elastic Array & Disk Usage Analyzer (v1.0)
Starter repository on GitHub: https://classroom.github.com/a/8GPH5Zwq
As storage densities continue to increase, so too will humanity’s ability to find new ways to generate more and more data. Storage space often seems unlimited… until it’s not! In this project, we will design a helpful command line utility for users, developers, and system administrators to analyze how their disk space is being used. Here’s a demonstration of the tool, da
:
$ ./da -n 15 -s /usr
/usr/lib/valgrind/libvex-amd64-linux.a 9.1 MiB 21 Aug 2020
/usr/lib/libclang.so 38.6 MiB 21 Aug 2020
/usr/lib/libclang.so.10 38.6 MiB 21 Aug 2020
/usr/bin/containerd 46.9 MiB 01 Feb 2021
/usr/lib/libclang-cpp.so 47.2 MiB 15 Feb 2021
/usr/lib/libclang-cpp.so.10 47.2 MiB 15 Feb 2021
/usr/lib/libgo.so 52.4 MiB 09 Sep 2020
/usr/lib/libgo.so.16 52.4 MiB 09 Sep 2020
/usr/lib/libgo.so.16.0.0 52.4 MiB 09 Sep 2020
/usr/lib/docker/cli-plugins/docker-buildx 54.2 MiB 03 Dec 2020
/usr/bin/docker 71.0 MiB 03 Dec 2020
/usr/lib/libLLVM-10.0.1.so 83.7 MiB 15 Feb 2021
/usr/lib/libLLVM-10.so 83.7 MiB 15 Feb 2021
/usr/lib/libLLVM.so 83.7 MiB 15 Feb 2021
/usr/bin/dockerd 84.4 MiB 01 Feb 2021
In this example, the user requested the top 15 files (-t 15
), sorted by size (-s
) from the /usr
directory. If they’re really trying to save space on this machine, then maybe it’s time to remove docker? :-)
The output columns include the file name, file size in human readable units, and the last time the particular file was accessed.
To get a sense of the functionality we will implement, take a look at the help/usage information:
$ ./da -h
Disk Analyzer (da): analyzes disk space usage
Usage: ./da [-ahs] [-t limit] [directory]
If no directory is specified, the current working directory is used.
Options:
* -a Sort the files by time of last access
* -h Display help/usage information
* -s Sort the files by size (default)
* -t limit Limit the output to top N files (default=unlimited)
Your implementation will be split into two parts: (1) building an elastic data structure that can store an unbounded number of elements (memory permitting), and (2) directory traversal and disk usage analysis.
You can think of the elastic array as being somewhat analogous to the ArrayList in Java; it will automatically resize, allow a variety of retrieval operations, and provide utility functionality such as retrieving the number of elements, trimming the amount of heap space used to save memory, and sorting the elements. When you are finished, you’ll have produced reusable library that may be helpful in future C projects.
The Elastic Array
While C has primitive array types, they must be dimensioned in advance and do not support convenience features like appending to the list or retrieving its size. Our goal for the elist
library is to fill this gap in functionality. Your elist
should support the following functions:
elist_add
– appends an element to the arrayelist_add_new
– creates storage space for a new element and returns a pointer to itelist_capacity
– retrieves the current list capacityelist_clear
– removes all elements from the arrayelist_clear_mem
– removes all elements from the array and zeroes them outelist_create
– initializes a newelist
data structureelist_destroy
– destroys and frees any memory allocated by anelist
elist_get
– retrieves a particular element by its indexelist_index_of
– determines the index of a particular elementelist_remove
– removes an element at a particular indexelist_set
– replaces an element in the array at a particular indexelist_set_capacity
– increases or decreases the storage capacity of the arrayelist_size
– retrieves the number of elements in the arrayelist_sort
– sorts the array
Array elements will have a fixed size; i.e., the expected size of the elements will be provided to elist_create
. This could be something like sizeof(int)
or even sizeof(struct my_special_struct)
, but regardless all elements will consume the same amount of bytes on the heap.
Elements added to the list via add
or set
will be copied onto the list on the heap; your array should not simply store pointers to the elements. This provides the most flexibility, since the user could maintain an array of pointers if that is the behavior they desire. The add_new
function will return a pointer to a new, uninitialized memory block in the list so that the user can populate it with data to simplify usage and avoid extra copies when unnecessary:
struct my_struct *s = malloc(sizeof(struct my_struct));
s->memb1 = 123;
s->memb2 = 456;
elist_add(list, s); // 's' is copied into the list
// vs.
struct my_struct *s = elist_add_new(list);
s->memb1 = 123;
s->memb2 = 456;
The array will start with an initial capacity, and once full you will double the capacity (RESIZE_MULTIPLIER = 2
) and realloc
the array’s storage. Removing a list element shifts the entire list; empty gaps are not allowed. The array will not be shrunk unless requested via set_capacity
, and if elements exist beyond the requested new capacity then they will be freed.
To allow sorting functionality, you can use qsort(3)
. The user will provide a comparator that your sort function passes to qsort
.
The Disk Usage Analyzer
The disk analyzer will traverse the file system recursively, locating all the files under a given directory. During traversal, each file’s full path, size, and last access time will be recorded in our elastic array for further inspection, sorting, and final formatting.
You will most likely want to use opendir
and readdir
to provide this listing, and stat
to retrieve access times and file sizes.
As part of your client code, you will need to write functions to perform unit conversions (bytes to human-readable units, like MiB, GiB, and so on) and format the date strings as shown in the demo above.
Implementation Restrictions
Restrictions: you may use any standard C library functionality. External libraries are not allowed unless permission is granted in advance. If in doubt, ask first. Your code must compile and run on your VM set up with Arch Linux as described in class – failure to do so will receive a grade of 0.
Testing Your Code
Check your code against the provided test cases. We’ll have interactive grading for projects, where you will demonstrate program functionality and walk through your logic.
Submission: submit via GitHub by checking in your code before the project deadline.
Grading
- TBA – test cases coming soon!
- 3 pts - Code review:
- Code quality and stylistic consistency
- Functions, structs, etc. must have documentation in Doxygen format (similar to Javadoc). Describe inputs, outputs, and the purpose of each function. NOTE: this is included in the test cases, but we will also look through your documentation.
- No dead, leftover, or unnecessary code.
- You must include a README.md file that describes your program, how it works, how to build it, and any other relevant details. You’ll be happy you did this later if/when your revisit the codebase. Here is an example README.md file.
Extra Credit
- 1 pts – add support for directories, showing the total space consumed by each directory (you’ll need to add up the sizes of all the files and subdirectories under each directory to do this). Check your changes into a separate branch and demonstrate during code review.
Changelog
- Initial project specification posted (2/22)