Project 4: Web Server (v1.0)

Starter repository on GitHub: https://classroom.github.com/a/KdbymZzs

HTTP Messages

A message in HTTP consists of three main parts: the start line, header fields, and an optional body:

Start line – specifies the message type
Headers – metadata about the message
Body – message content

Each header field is separated by newlines. However, instead of the usual \n escape sequence for a newline, standards-compliant HTTP servers and clients use \r\n. Servers should also ignore any empty line(s) at the beginning of a message.

The header fields and body are separated by a single blank line (i.e., a line that only contains \r\n).

There are two types of HTTP messages: requests and responses.

HTTP Request Handling

Client applications send HTTP request messages to the server to retrieve content, send data, etc. In our case, we’re only concerned with GET requests, which retrieve information identified by a Request-URI. Here’s an example GET request sent from the Safari web browser, asking to retrieve /index.html.

GET /index.html HTTP/1.1
Host: localhost:8080
Upgrade-Insecure-Requests: 1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1 Safari/605.1.15                                                                          
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: keep-alive

Notice that there is no body content here – nothing after the blank line. This is because the header fields contain all the information necessary to request a file from the server.

GET requests may specify directories as well as files. If the URI contains a directory, then the web server should append index.html to the request; e.g., a request for / should resolve to /index.html.

HTTP Responses

After the web server receives a request, it will send a response. Responses are formatted similarly to requests; here’s an example response that was successful (status code 200) and includes the body (‘Hello world!'):

HTTP/1.1 200 OK
Date: Sun, 05 May 2020 23:53:37 GMT
Content-Length: 12

Hello world!

In general, the server opens the file that was requested, reads it, and writes its contents into the body of the message. If the file doesn’t exist, the server will return a 404 status code and send an error message as the body content. For most responses, the Date header field is required and must be formatted as shown above.

Implementation

This mini-project is an extension of Lab 8. Use the chat server code from the lab to build your web server. You’ll need to make some modifications to do this:

Make the server start a new thread with pthread_create for each incoming client connection. Most modern web browsers pipeline requests across multiple connections, so you’ll need to support this for basically anything more complex than “Hello world!”.
- Detect the number of CPU cores on the host machine and cap the number of threads at 2x the number of cores. For example, if your VM has two cores, no more than four threads should be active at at time.
- Although generally lighter weight than processes, creating and destroying threads is still costly. To improve performance, create a basic thread pool and work queue to handle requests.
Create a function to read HTTP requests and produce HTTP responses. You should support general file transfers as well as reporting missing files (404).
Serve files from the directory specified as the 2nd argument. Hint: chdir() into this directory.
Use the sendfile() system call to transfer the body of the message. Use stat to determine the file size before the call to sendfile.
- sendfile is better than performing a read() followed by a write(); doing so transfers data through user space, whereas sendfile operates completely in kernel space. This results in a big performance boost.

To test your server, forward the remote HTTP port to your local machine. Here, I’m forwarding port 8080 on my VM to port 8080 on my local machine. Then I can simply navigate to http://localhost:8080 to test my code:

ssh snuggly-bunny -L 8080:localhost:8080

You can also do this through gojira instead (substitute your own IP address):

ssh gojira -L 8080:192.168.122.103:8080

Execution Flow

The following steps occur when resolving a web request:

The client (browser) connects to the server.
Client sends an HTTP request with the request URI
Server locates the file; if it exists, it determines the file size.
Server sends the response headers back to the client with write()
Server sends the file body to the client with sendfile()
Client renders the web page, file etc.
Server waits for the next request (Note: the connection is not necessarily closed)

Grading

[1 pts] - Basic functionality, choosing a port, listening on a socket.
[2 pts] - Basic HTML page retrieval (includes parsing URIs)
[1 pts] - Navigating between pages with links works
[1 pts] - 404 Not Found
[1 pts] - Redirecting directories to “index.html”
[1 pts] - Multi-request pages: this tests that your server is multi-threaded.
[1 pts] - Ensuring you don’t start more than 2 * num_cores threads in your thread pool.
[2 pts] - Thread pool implementation