!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> CS 112: Introduction to Computer Science II

Design and development of significantly sized software using top-down design and bottom-up implementation. Dynamically allocated data, object-oriented programming, architecture of memory, basics of language translation, and basics of algorithm analysis. Development of simple graphical user interfaces. Prerequisite: CS 110 (grade of C or better).

Extra credit - Lab 7 - Word Counter with HashMaps - (50 points)



Due date: 05/19/2016 at 11:59pm

Objectives



  • Count words in a document and use stopwords
  • Implement Hashmaps with collision
  • Handle collisions using linked lists

Word count program



Create a Java Project called lab7.

For this lab, you will analyze a text (.txt) document and count the number of times every word is seen in that document.

  • The input file will be specified by the user as a command-line argument.
  • Since words such as "the", "and", "or", etc. are frequently seen in the English language, they distract from seeing any interesting themes in the text. These words are commonly known as "stop words". For this lab, you will use this stop list (list of stopwords) to prevent adding these words to the hashmap. This will allow more interesting themes to emerge.
  • Words such as "harry" and "Harry" should be counted only once. Similarly, there should not be multiple entires for words such as "Harry", "Harry," and "Harry." They should all be counted as the same word. Hint: Refer to some very useful functions in the String class.
  • You can download sample text files from Project Gutenberg to experiment as input for your word frequency counter. For example, you can download A Study In Scarlet from Project Gutenberg.
  • The program should print the following menu (Make sure the input text filename can be specified as a command line argument):

    • Print the HashMap in sorted order. When printing words, print only those words that
      1. have a frequency greater than 5 and
      2. have more than 4 characters in them
    • Find a user specified word in the HashMap and print its count. If the word does not exist in the HashMap, print an appropriate message informing the user that the word does not exist in the HashMap.
    • Exit the program

Implementing your own HashMap



To organize your data, you will implement a HashMap using an array of size 26 elements. Create a new HashMap class called CS112HashMap. The hashcode function will be very straightforward. Use the first letter of the word to obtain the index into the hashmap array. For example, words starting with A map to 0 and words starting with Z map to 25.

  • The HashMap contains an array of nodes
  • Each Node contains a String attribute, a count variable and a reference to the next node. The count variable keeps track of the number of times that word has been seen.

The HashMap must have the following methods.

  • add - To add a new Node to the HashMap. If a word already maps to the location in the array, then use a linked list to add that element to the same bucket but at the end of the linked list.
  • find - Return the count if the user-specified word is found in the hashmap and -1 if the word is not found in the hashmap.
  • print - Print the hashmap such that all the words and their corresponding values are printed on the screen. Note: This can be very helpful for debugging. So do this first after the add method.

Submitting the assignment



  • Right-click on the project name (lab7) and after choosing Team->Share Project, choose Team->Commit to commit the files. Select README and your program files (all Java files). Click OK to submit your assignment.
  • You can resubmit the assignment as many times as you want. It will upload a new copy of the files to the SVN repository. We will only grade the one that is closest to 11:59pm on the due date.