Lab 2: Improving our C Library
Most operating systems are written in C, which means they usually also include a C Standard Library of some sort. This could be glibc, musl, or something else. In fact, different distributions of Linux (e.g., Ubuntu vs Alpine) ship with different implementations of libc. The C libraries will agree on what base set of functions should be available, but not necessarily on their implementations.
Today we’re going to add a couple of functions to our own C libary that aren’t part of the standard library. They’ll be directly inspired by fgets
and getline
, though. If you’ve never used either of these, it might be a good idea to check out the man
pages. But the short version is, they read files line by line.
Reading the OS way
Our C lab used gets
to read a string from the user. If you check out the man pages for gets
on a Linux machine, you will see:
DESCRIPTION
Never use this function.
...
BUGS
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and because gets()
will continue to store characters past the end of the buffer, it is
extremely dangerous to use. It has been used to break computer security.
Use fgets() instead.
For more information, see CWE-242
(aka "Use of Inherently Dangerous Function")
at http://cwe.mitre.org/data/definitions/242.html
That sounds pretty bad, but our version of gets
is somewhat different: it requires us to pass in the size of the buffer we’re operating on, so it’s actually more like fgets
. It’s still possible to have things go wrong, but it’s not quite like jumping into a giant pit of hungry tigers.
[!] Try to figure out how gets
is implemented. Once you do, you’ll see that it’s using the read
system call to do its heavy lifting.
read()
takes three arguments:
- fd – the file descriptor to read from. To get a file descriptor, use
open()
. File descriptors are simply integers that are used to identify the files opened by a program. - buf – where data being read will be copied.
- count – number of bytes to read. Since
read()
doesn’t really recognize where lines start or end, you will tell it how many bytes to read at a time instead.
read
isn’t just a regular function. It’s a system call. We’ll be digging into system calls more in the coming days, but for now if you’re interested you can look through the code to find sys_read
if you want more information about how it works.
Lab Instructions
In this lab, you’ll add fgets
and getline
to our C library. Here’s what you need to do:
- Study the implementation of
gets
. It’ll make it easy to implementfgets
andgetline
. - Add the new functions to
user.h
. This will allow other programs to use them. - Implement
fgets
inulib.c
.- Instead of being hard-coded to always read from standard input (file descriptor 0), allow the user to read from arbitrary files. You’ll notice that the standard implementation of
fgets
takes aFILE *
as a parameter, but we’ll just use a file descriptor instead. - Instead of returning the string that was read, return the number of characters read. This will be helpful when implementing
getline
.
- Instead of being hard-coded to always read from standard input (file descriptor 0), allow the user to read from arbitrary files. You’ll notice that the standard implementation of
- Refactor
gets
to use the newfgets
function instead of its own implementation - Implement
getline
inulib.c
. This function will leveragefgets
and will auto-allocate and resize a buffer as reading takes place. You can think of it as a dynamic fgets that handles memory allocation automatically.- It might be good to play around with the real
getline
to get comfortable with how it should work. - If you use
malloc
inulib.c
, then you will probably have to update your Makefile becauseforktest
won’t build properly. Look at how other dependencies are specified and add$U/umalloc.o
.
- It might be good to play around with the real
- Write a test program that exercises both of these functions. We’ll use them later, so make sure they work well!
- A good approach here is to have the two functions read blank lines, small lines, and long lines, and check to make sure they handle hitting the end of the file correctly.
getline hints
Now would be a great time to use realloc
to resize the buffer we’re operating on as we read more data. Unfortunately, we don’t have an implementation of realloc in our OS… yet. To get around this you can:
malloc
a new buffer with with more spacememcpy
the old contents of the buffer to the new bufferfree
the old buffer
Then the pseudocode for getline
looks like:
initialize buffer if not already initialized (size = 0)
while we have more to read:
read into the buffer using fgets
track the total amount of bytes read
if we hit EOF or error:
stop, return number of bytes read or -1 for error
else:
resize the buffer to prepare for another read
(double the buffer with each resize)
One more hint: when you’re using fgets to read data into the resizable buffer, make sure you’re not overwriting the beginning of the buffer and instead concatenating the next data you’ve read.
If you want to add a large text file to your OS file system, add it to the fs.img
recipe in the makefile. If you need a hint for this, check out how README.md
is handled. Once you have a file in your OS file system, you can use open
to open it and get a file descriptor, like this:
// open a file in readonly mode
int fd = open("some_file.txt", O_RDONLY);
If it works, fd
will be a nonnegative integer that you can use to pass into fgets
and getline
.
Grading and Submission
To receive 50% credit:
- Implement
fgets
and add Javadoc-style documentation - Refactor
gets
to use your newfgets
function
To receive full credit for this lab:
- Complete all previous requirements
- Implement
getline
and add Javadoc-style documentation
Once you are finished, check your changes into your OS repo. Then have a member of the course staff take a look at your lab to check it off.