CS 326 Operating Systems

Project 1: Parallel File Search Tool (v 1.1)

Starter repository on GitHub: https://classroom.github.com/a/LRRalqkO

Our journey through the operating system starts in userland (user space), outside the kernel. In this project, we’ll implement a Unix utility that recursively searches for matching words in text files. If you’ve ever used the grep command from a shell, our program will be somewhat similar, except:

A good approximation of these features with grep would be using the -Rnw flags, like:

grep -Rnw term1 term2 term3

Our version of the tool will make use of multiple threads running in parallel, so we’ll call it prep. To give you an idea of how your program will work, here’s a quick example:

# Searches for hello in all the files located in /etc. Note that case is
# ignored, and the line number where the match was found is also included.
# Line numbers start at 1, not 0.
$ ./prep -d /etc HELLO
/etc/services:1118:hello-port        652/tcp
/etc/services:1119:hello-port        652/udp
/etc/services:2919:hello            1789/tcp
/etc/services:2920:hello            1789/udp
/etc/services:5007:aimpp-hello      2846/tcp
/etc/services:5008:aimpp-hello      2846/udp

# With the -e flag, the match is case-sensitive. No results are returned:
$ ./prep -d /etc -e HELLO

# Here we find a name in three different files.
# Each file will be searched by a different thread:
$ ./prep -d /usr/share manoj
/usr/share/locale/or/LC_MESSAGES/cracklib.mo:8:Last-Translator: Manoj Kumar Giri <mgiri@redhat.com>
/usr/share/locale/or/LC_MESSAGES/Linux-PAM.mo:37:Last-Translator: Manoj Kumar Giri <mgiri@redhat.com>
/usr/share/locale/or/LC_MESSAGES/glib20.mo:197:Last-Translator: Manoj Kumar Giri <mgiri@redhat.com>

# We can specify multiple search terms, of course:
$ ./prep -d /usr/share whitman nutella kapow stranger
/usr/share/cracklib/cracklib-small:47267:stranger
/usr/share/cracklib/cracklib-small:53793:whitman
/usr/share/perl5/core_perl/pod/perlpacktut.pod:596:An even stranger template code is C<%>E<lt>I<number>E<gt>. First, because
/usr/share/perl5/core_perl/pod/perlcall.pod:1462:eventually consume all the available memory in your system--kapow!

# By default, prep will search the current working directory (CWD).
# The full path is always printed.
$ ./prep main
/home/matthew/P1-Solution/prep.c:141:int main(int argc, char *argv[])

# We can 'cd' somewhere else and then run prep from there.
# This run also limits the number of threads to 2.
$ cd /etc
$ ~/P1-Solution/prep -e -t2 absolutely
/etc/lvm/lvm.conf:1538: # you are absolutely sure about what you are doing!
/etc/lvm/lvm.conf:1622: # by hand unless you are absolutely sure you know what you are doing!<Paste>

Note that the output format is:

/absolute/path/to/file:line-number:the entire line the word was found in

An absolute path starts from the root directory: /. You can tell whether a path is absolute or relative by looking at the first character: if it’s /, the path is absolute. Otherwise, it’s relative (e.g., ./blah, or even some/path/file.txt).

If multiple matches are present on a single line, only print it once. You should also remove punctuation when you are searching for words; the punctuation removed in the examples above is:

\t\r\n.,:?!`()[]-/\'\"<>

Along with spaces.

Since this is a parallel search, your implementation should detect the number of cores on the machine and use this number as the default upper bound for threads launched by the program. For each file that you find (recursively), you will launch a thread that looks for occurrences of the search term(s) specified. If there are more files than threads available, then you should wait until a thread finishes before starting another. Using a semaphore from the pthreads library is a good way to accomplish this.

In this assignment, you will get experience working with:

There are a few other features you need to implement. We’ll let the program do the talking by printing usage information (-h option):

$ ./prep -h
Usage: ./prep [-eh] [-d directory] [-t threads] search_term1 search_term2 ... search_termN

Options:
    * -d directory    specify start directory (default: CWD)
    * -e              print exact case matches only
    * -h              show usage information
    * -t threads      set maximum threads (default: num CPUs)

# Note that ANY time the user passes in -h, you'll ignore the other options:
$ ./prep -e -t 4 -d / -h
(displays help, and exits)

Testing Your Code

You should make sure your code runs on the Raspberry Pi. We’ll have interactive grading for projects, where you will demonstrate program functionality and walk through your logic.

Our recommendation is to start out with working on the directory listing. Next, implement the word search functionality. Finally, parallelize your logic using pthreads.

Submission: submit via GitHub by checking in your code before the project deadline. You must include a makefile with your project. As part of the testing process, we will check out your code and run make to build it.

Grading

Extra Credit

Restrictions: you may use any standard C library functionality. External libraries are not allowed unless permission is granted in advance. Your code must compile and run on your Raspberry Pi set up with Arch Linux as described in class – failure to do so will receive a grade of 0.

Changelog