Lab 6
Iteration vs. Recursion
Due Wednesday, April 26, 11:55pm.
Submission through SVN.
Please submit your work in SVN directory
https://www.cs.usfca.edu/svn/< your username >/cs112/lab6
e.g. https://www.cs.usfca.edu/svn/ejung/cs112/lab6
Goal
This lab is designed for you to practice using recursion and compare its performance to an iterative version. You will design and implement a word frequency counter. Word frequency is heavily used in search engine and data mining. In particular, Term Frequency-Inverse Document Frequency (TF-IDF) is often used to compute how relevant each document is to given keywords. Many search engine algorithms include TF-IDF in their ranking. Your program will take a file name as a program argument, and print out the list of (word, frequency) pairs. Write two versions, recursive and iterative.
Specifications
- Driver (30%)
- (10%) Handling the file input properly. If the program argument is empty, ask the user for the file name until the user provides a valid file name.
- (10%) Create the list of all words in the given file, and call the frequency counter methods, both iterative and recursive, with this list of words.
- Ignore the cases. For example, "apple" and "Apple" count for "apple".
- Ignore all the punctuations. For example, "apple" and "apple." count for "apple".
split("[^a-zA-Z]")
ignores anything but the alphabet.
- Count the singular and plural forms separately. For example, "apple" and "apples" are counted separately.
- Similarly, count the conjugated verbs separately too. For example, "go" and "goes" are counted separately.
- (10%) Compare the results from the two versions to make sure that they match.
- If you wish to write the result into a file, see here for an example code that writes into a file.
- If you wish to compare the result in your program, Driver can go through the list of all the words in the given file and compare the frequency from the two methods.
- (30%) Implement the iterative version correctly.
- (30%) Implement the recursive version correctly.
- (10%) Write a README document that explains which is a better design, using either iteration or recursion in this lab. Discuss pros and cons of each approach in terms of how much coding you have to do, how efficient each version is, and how easy it is to explain how your program works to others. You are welcome to use other criteria you can think of.
- Extra-credit (up to 10%): Print the frequency result in a descending order of frequency, via implementing Comparable interface. You may use Sorting.java from textbook to sort the outcome. The sorted outcome does not need to match the order of the example run below. Explain your intended order in the README.
Hint
To compare your output to the example run, diff is a great software in linux and Mac. The usage is
diff file1 file2
Example Run
The sample file is from this blog post. The output for this sample file is here.
What to submit
README, and all Java files.