Modifying existing source code

Due Date: Mar 12, 2009

One of the most important skills you need as a developer is the ability to read other people's source code and to modify it. In this project, you will be given three Java files taken from a much larger project. Your job is to combine these Java files into a single package and extend the functionality. You also must create a tool that uses the HTML stripper method to extract text from either a file or a URL.

Further, you must write up a short report describing your thoughts on the code you inherited and the ease with which you could modify it.

Functionality

You first need to change the package for all three source files:

to package edu.usfca.html. Read of the source code to get a general idea of what functions are available and so on. There are about 1000 lines of code.

The code will not compile that I have given you. You may remove any of the methods that are unnecessary to the specific task I'm giving you here. That will make it so you can compile the code.

Also, there are two errors in the functionality that you should fix:

  1. It does not recognize a lowercase closing script tag (ie, </SCRIPT> will work but </script> does not).
  2. If a <a href="..."> tag contains any other tags (such as <strong>), these tags are not stripped.

Then, alter the functionality of HTMLUtils.stripHTML() so that it ignores comments and skips everything in between PRE tags (and skips the PRE tags themselves). Your goal is to extract just the words from an HTML string.

Finally, make a HTMLStrip class whose main() takes args[0] as a file name or, if the argument starts with http://, treats it as a URL. The output of your program should be the text result of the stripHTML() method. Send the output to System.out. Put the HTMLStrip class in package edu.usfca.html.

Use google keyword search "fetch URL java" to find code snippets that allow you to pull the text from a URL on the web. Give credit to who and where you got the code, though. Naturally, you must also keep any copyright notices you see in the source code that you find or that is provided by me.

You may alter but not rewrite stripHTML(); you must extend its functionality. You cannot use Java's fancy StringTokenizer or Scanner classes etc....

The Report

Besides updating the code and making a main program, you must document your experience of updating another person's code. I am looking for the following information referring to the upgrade to the expression evaluator:

You get the idea. I really want you to think about the process of someone else's code. Generally tell me about the experience. You are encouraged to learn about this process by reading the web and so on. This learning can be included in your report. In other words, you are free to extrapolate about larger projects from this experience and describe your new understanding. I expect at least two pages of text and for the document to be well-written and organized. Make sure you spell check it and have a native speaker look at the document if English is not your native language.

Submission

You must place html.jar into html/trunk/lib in svn.

The html.jar jar must contain the tool HTMLStrip.java and other Java files needed to make it work, and all associated .class files for your project.

The main class must be HTMLStrip.

Print out only your new HTMLStrip class and the HTMLUtils class. Do not print FileUtils.java nor Utils.java.

Your report and code must be printed out, stapled, and turned in at the start of class on the due date.

Grading

I will run your project unit tests via

java -cp html.jar edu.usfca.html.HTMLStrip somefile

and

java -cp html.jar edu.usfca.html.HTMLStrip 'http://www.cnn.com'

etc...

The code upgrade is worth 6 points and the report is worth 4 points.

Your grade is a floating point number from 0..10.