Project 5 - An Indexer

Due - Friday, November 14, 2008

The goal of this project is to give you more experience with linked lists. For this project, you will create an indexer similar to what might be used by a search engine. Your indexer will process a large text file and create a sorted array of all words occurring in the document. For each word, you will keep a linked list of positions where the word occurs. You will also provide two look-up operations. The first operation will take as input a single word and will return all positions where the word occurs. The second operation will take as input a two-word sequence and will return all positions where the sequence occurs.

Your program will operate in two steps. In step 1 you will process the input text file and build the index---the array of words and positions in which the word occurs. In step 2 you will process a second file containing several 1 and 2 word queries. For each query, you will perform a lookup, the result of which will be a linked list containing all of the positions where the word or phrase occurs. You will write the result to a text file in the format word1 word2: position1 position2. For example and the: 34 78 356 would indicate that the phrase "and the" appears at positions 34, 78, and 356 in the document.

Following is the design I expect you to implement. You may extend this design, implementing additional classes and methods as necessary. However, if you wish to change this design you must first seek approval from me.

LinkedList

The LinkedList class will be a standard linked list. You may use the LinkedList class you wrote for Lab 7.

WordEntry

The WordEntry class will contain two data members: a String representing a particular word and a LinkedList of Integers which represent the positions where the word occurs. The class will also support the following methods:

Index

The Index class will contain two data members: an array of WordEntry objects and an int to represent the number of entries currently contained in the array. The class will also support the following methods:

FileProcessor

The FileProcessor class will open the text file and build the Index. It will have one method:

SearchProcessor

The SearchProcessor class will open the file containing the query terms, process the queries, and write the results to a new file. It will have one method:

Implementation Hints

Due 9:40AM, Friday, November 14, 2008

  1. Complete and submit your working code. Turn in a hard copy in class and place a copy of your .java files in /home/submit/cs112-f08/username.
Note: No portion of your code may be copied from any other source including another text book, a web page, or another student (current or former). You must provide citations for any sources you have used in designing and implementing your program.
Sami Rollins