Lab 2: Improving our C Library

Most operating systems are written in C, which means they usually also include a C Standard Library of some sort. This could be glibc, musl, or something else. In fact, different distributions of Linux (e.g., Ubuntu vs Alpine) ship with different implementations of libc. The C libraries will agree on what base set of functions should be available, but not necessarily on their implementations.

Today we’re going to add a couple of functions to our own C libary that aren’t part of the standard library. They’ll be directly inspired by fgets and getline, though. If you’ve never used either of these, it might be a good idea to check out the man pages. But the short version is, they read files line by line.

Reading the OS way

Our C lab used gets to read a string from the user. If you check out the man pages for gets on a Linux machine, you will see:

DESCRIPTION
       Never use this function.

...

BUGS
    Never  use  gets().   Because  it is impossible to tell without knowing
    the data in advance how many characters gets() will read, and because gets()
    will continue to store characters past  the  end  of the  buffer,  it  is
    extremely dangerous to use.  It has been used to break computer security.
    Use fgets() instead.

    For more information, see CWE-242
    (aka "Use of Inherently Dangerous Function")
    at http://cwe.mitre.org/data/definitions/242.html

That sounds pretty bad, but our version of gets is somewhat different: it requires us to pass in the size of the buffer we’re operating on, so it’s actually more like fgets. It’s still possible to have things go wrong, but it’s not quite like jumping into a giant pit of hungry tigers.

[!] Try to figure out how gets is implemented. Once you do, you’ll see that it’s using the read system call to do its heavy lifting.

read() takes three arguments:

read isn’t just a regular function. It’s a system call. We’ll be digging into system calls more in the coming days, but for now if you’re interested you can look through the code to find sys_read if you want more information about how it works.

Lab Instructions

In this lab, you’ll add fgets and getline to our C library. Here’s what you need to do:

  1. Study the implementation of gets. It’ll make it easy to implement fgets and getline.
  2. Add the new functions to user.h. This will allow other programs to use them.
  3. Implement fgets in ulib.c.
    • Instead of being hard-coded to always read from standard input (file descriptor 0), allow the user to read from arbitrary files. You’ll notice that the standard implementation of fgets takes a FILE * as a parameter, but we’ll just use a file descriptor instead.
    • Instead of returning the string that was read, return the number of characters read. This will be helpful when implementing getline.
  4. Refactor gets to use the new fgets function instead of its own implementation
  5. Implement getline in ulib.c. This function will leverage fgets and will auto-allocate and resize a buffer as reading takes place. You can think of it as a dynamic fgets that handles memory allocation automatically.
    • It might be good to play around with the real getline to get comfortable with how it should work
  6. Write a test program that exercises both of these functions. We’ll use them later, so make sure they work well!
    • A good approach here is to have the two functions read blank lines, small lines, and long lines, and check to make sure they handle hitting the end of the file correctly.

getline hints

Now would be a great time to use realloc to resize the buffer we’re operating on as we read more data. Unfortunately, we don’t have an implementation of realloc in our OS… yet. To get around this you can:

  1. malloc a new buffer with with more space
  2. memcpy the old contents of the buffer to the new buffer
  3. free the old buffer

Then the pseudocode for getline looks like:

initialize buffer if not already initialized (size = 0)
while we have more to read:
    read into the buffer using fgets
    track the total amount of bytes read
    if we hit EOF or error:
        stop, return number of bytes read or -1 for error
    else:
        resize the buffer to prepare for another read
        (double the buffer with each resize)

One more hint: when you’re using fgets to read data into the resizable buffer, make sure you’re not overwriting the beginning of the buffer and instead concatenating the next data you’ve read.

If you want to add a large text file to your OS file system, add it to the fs.img recipe in the makefile. If you need a hint for this, check out how README.md is handled. Once you have a file in your OS file system, you can use open to open it and get a file descriptor, like this:

// open a file in readonly mode
int fd = open("some_file.txt", O_RDONLY);

If it works, fd will be a nonnegative integer that you can use to pass into fgets and getline.

Grading and Submission

To receive 50% credit:

To receive full credit for this lab:

Once you are finished, check your changes into your OS repo. Then have a member of the course staff take a look at your lab to check it off.