Due Wednesday 9/2 - 3:30pm
In this assignment, you will
brush up on your Java skills and review Java's file I/O libraries and
Collections framework. You will write a program that processes several
text files and builds an
inverted index
.
Your inverted index will be a data structure that stores a mapping from
words to the documents in which those words were found.
Requirements
-
You
will design the inverted index data structure using any data structures
available in the Java Collections framework. Think about efficiency!
Insertion of new records should be fast. Also, given a word, finding
its record should be fast.
-
Your program will take as input
a String denoting a directory on the user's computer. It will traverse
the directory and all its subdirectories. For each text file found
(you may assume you only process files with extension
.txt
), your program will process the file and add the appropriate data to the inverted index.
-
For
each word in the file, your program will store a record in the inverted
index indicating the document in which the word appears and the
position at which the word was found in the document.
-
Your program will ignore all characters except letters and digits.
-
The output of your program will be a text file named output.txt that contains the information in the inverted index.
-
You
will submit all of your code and class files in a jar called
invertedindex.jar. I will run your program as follows. If your
program does not run as follows, one letter grade will be deducted from
your score.
java -cp invertedindex.jar Driver -d /My/Directory
Submission Instructions
|